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ON THE UNBIASED CHARACTER OF LIKELIHOOD-RATIO TESTS 
FOR INDEPENDENCE IN NORMAL SYSTEMS 

By Joseph F. Daly 


1. Introduction. In the* statistical interpretation of experimental data, the 
basic assumption is, of course, that we arp dealing with a sample from a statistical 
population, the elements of which are characterized by the values of a number of 
random variables x l , • • ■ , /. But in many cases we are in a position to assume 
even more, namely, that the population has an elementary probability law 
f(x\ •••,**; 0i, • • ■ , Sh), where the functional form of f{x, 0) is definitely 
^specified, although the parameters Qi, ■■■ ,B h arc to be left free for the moment 
to have values corresponding to any point of a set B in an ^-dimensional space,' 
Under this assumption, the problem of obtaining from the data further infor¬ 
mation about the hypothetical distribution hmf(x, 0) is considerably simplified. 
For it is then equivalent to that of deciding whether or not the data support the 
hypothesis that the population values of the 0’s correspond to a point in. a certain 
subset u of fl. For example, we may have reason to believe that the population 
K has a distribution law of the form 


/(a 1 , x\ a\ a s , An, Aq, An) =» 



A j/(!*■—a (s^—a^) 


Here the set SI is composed of all parameter points (a 1 , ■ - - , An) for which the 
matrix || An || (t, j *» 1, 2) is positive definite and for which - «> < a‘ < w. 
We may wish to decide, on the basis of N independent observations (x \, xl) 
drawn from K, whether has the value zero for the population in question, 
without concerning ourselves at all about the values of the remaining param¬ 
eters; in other words, we may wish to test the hypothesis H that the parameter 
point corresponding to K lies in that subset of U for which /li 2 = 0. One way to 
test this hypothesis is to select some (measurable) function g(x) whose value can 
bo determined from the data, say 


0(x) 


£ (*l ~ #*)(** - if) 

#» i 

JL " Ilf ill 1* 

£ (®* - *‘) 5 £ («1 ~ «*)* 
«*»i J L*“i J 


Now g(x) is itself a random variable, so that it has a distribution law of its own 
when its constituent x’b are drawn from any particular population K. Suppose 
then we choose a sot of values of g(z), say S, such that the probability is only 05 
that g(x) will lie in the sot 8 when the x’b arc drawn independently from a 
population K for which the above hypothesis H is true. Ordinarily we would 

l 
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take S to be of the form | g(x) \ > g 0 , and the test would then reject 11 at the .05 
probability level if the computed value of g(x) came out too large. But for all 
that has been said so far, we are perfectly free to choose a different critical 
region S, and even a different function g{x) The essential elements of this 
type of test are then a critical region S, a function of the data g, and a probability 
level e, such that the probability is t = .05, say, that g C $ when 11 is true; in 
employing the test wc reject H at the given probability level whenever the 
sample value of g falls in the critical region. 

By the very nature of the problem, any inferences wc make from a Sample, are 
subject to possible error. In the kind of test under consideration, the only error 
we can commit, strictly speaking, is that of rejecting H when it is true (an error of 
Type I in the terminology of Neyman and Pearson [9]). The risk of such an 
error is thus known in advance; for if wc use the test consistently at, say, the .05 
level, we know that the probability is 05 that wo shall be led to reject a given 
hypothesis when it is true. On the other hand, it is quite conceivable that the 
test may be even less likely to reject H when it is false, or more precisely, when 
the true 0’s correspond to a point of SI which is not in a. In this event the test is 
said to be biased. Let us make this term more definite by proposing the follow¬ 
ing definitions: 

Definition I. A test is said, to be completely unbiased if it has the property 
that for any probability level t (0 < t < 1) the probability of refecting II is greater 
whan the 0’s correspond to a point of SI — w than when they correspond to a point of oj. 

Definition II. A test is said to be locally unbiased if the set SI contains a 
neighborhood U of u such that for any probability level e (0 < e < 1) the probability 
of rejecting H is greater when the parameter values correspond to a point of U — to 
than when they correspond to a point of oj. 

It is the purpose of this paper to consider the question of bias in connection 
with the Neyman-Pearson. method of likelihood ratios [8] as applied to the 
testing of what may well be called hypotheses of independence in multivariate 
normal populations. The likelihood ratio method is undoubtedly a very familiar 
one, since the vast majority of tests in present statistical practice arc based on 
this method.' But for the sake of completeness wc shall outline it briefly, Let 
the distribution law of the population K be of the form/(a; 1 , . ■ • ,x k \ 0 X , ... ,0 h ) 
where the 0's may correspond to any point in a set SI, and let the hypothesis 11 
to be tested be that the 0’s actually belong to the subset o> of O. Form the 
likelihood function 

N 

* p *biQ = n/(z«, •••,**; 01, ..., 0 h ) 

a —1 

i ? e., the elementary probability law of a sample of N elements drawn inde¬ 
pendently from K. Denote by P"(x) the maximum of P N for fixed x where the 
0 s are allowed to range over O, and denote by PJJ(x) the corresponding maximum 
value when the 0’s are restricted to w. The test criterion is then 
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Evidently X depends only on the observable quantities x' a , and has the range 
0 < X < 1, with a definite probability law depending on that of the basic popula¬ 
tion K. In this method the critical region S is taken to be 0 < X < X,, where 
X e is so chosen that the probability P{X < X,j is e when the parameters of K 
correspond to a point in w (It may be noted here that in all the cases with 
which we shall have to deal the probability that X lies in S when H is true is 
independent of the particular values of the 6’s as long as they correspond to a 
point of co.) The reason for taking the critical region to bo of the form 0 < X < 
X« and not, say, X, < X < X" or X, < X < 1 may become clearer when we examine 
the resulting tests for bias. 

The recent work of Neyman and Pearson [10] has led them to lay considerable 
stress on the importance of unbiased tests. And though their attention has been 
directed mainly to the broader outlines of the theory of testing hypotheses, 
they have stimulated other writers to study particular tests of great practical 
importance, P C Tang [11] has obtained the general sampling distribution of 
1 — X 2/ " for what we shall call the regression problem with one dependent variate, 
and has given tables for P, (X < X, ] —essentially proving the unbiased character 
of the test—which should be extremely useful. His article also contains an 
excellent discussion of the manner in which this test is related to the well known 
tests of linear hypotheses [7] and to the ordinary analysis of variance. P. L. 
Hsu [ 6 ] has shown that this same distribution is fundamental in the study of 
Hotelling’s generalized T test [5] (a special but important case of what we Bhall 
call the general regression problem), and has proved that (locally) this test is 
not only unbiased but "most powerful” in a certain sense. On the other hand, 
it is not true that all likelihood ratio tests are unbiased [2]. Consequently, the 
knowledge that in a rather wide class of problems which arise in normal sampling 
theory the method of likelihood ratios furnishes tests which are either locally or 
completely unbiased would seem to be of some value, even when the exact 
sampling distribution of the criterion is too complicated to tabulate. 


2 . The regression problem with one dependent variate. Suppose that y is 
known to be normally distributed about a linear function of the fixed variables 
x , ■ ■ , x T , so that the family of populations under consideration is characterized 
by a distribution function of the form 


( 2 . 1 ), 




f(y \x, b, <r 2 ) = ( 2 inr 2 ) *e 
where the set of admissible values of a - 2 and the b ’s is 

fi:0<tr 2 <oo, —OO < h< < CO . 


Let H be the hypothesis that the point (o’ 2 , hi, • ■ • , b r ) lies in the subset of 
defined by 


to: b t+ 1 — bq+i = • • • = b r = 0 . 
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The likelihood ratio appropriate to testing the hypothesis II on the basis of N 
(N > r) independent observations drawn from such a population is then 



with the understanding that the values of the fixed variables x a , • • •, Xa asso¬ 
ciated with the a-th observation have been so chosen that the matrix || «' J || — 

£ XaXa I 


a -1 

mum 


is positive definite, (The expression in the numerator is the mini- 
ii 

JV / r \2 

Of Z) ( V* - £ ) for variations of the b’s over Q, while the denomina- 

„_i V ,-i / 


tor contains the corresponding minimum for variations of the b’s over w), 

In order to show that the test is unbiased, we shall make use of the exact 
sampling distribution of the quantity 


f 


= I - X 2/ " 


1 


first published by P C. Tang [11]. Writing || A ah || for the inverse of the 
matrix |] ]| composed of the first q rows and columns of || a' 1 1 | , let us put 



Since the critical region 0 < X < k, corresponds to the region 1 — X 2/ " = f, < 
{ < 1, it can then be shown that the probability of rejecting II when the popula¬ 
tion parameters have specified values <r 2 , b\ • • , V is expressed by the series 


( 2 . 2 ) 

..where 


to pv r l (r—1 / 1 i (W—r)—1 

m «.) = j / - yx-TTv - 

-0 vl 4 I3[f(r - q) + V, - r)] 


B(w, v ) 


r(u)r(a) 
r(u + a) 



dz. 


Now Q is a positive definite quadratic form in the parameters 6 17+1 , ... , b r , so 
that it vanishes if and only if the hypothesis is true. And if 0 < t < 1, then 
fd.is a monotone increasing function of G. For by differentiating (2.2) 
we obtain 


(2.3) 


3 V«U) = <r 


yff' f l J _ j)UW-r)-l 

& id if, \B[Kr - q) + v + 1, m -7)] 

~ B[i(r -q)+v, W~r)]) dl 


And from a property of incomplete Beta functions, which we shall demonstrate 
in the next section, it follows that each term in the series (2 3) is positive, Ac¬ 
cordingly we have 
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Theorem I The likelihood ratio test for the hypothesis that in a population of 
type (2 1) certain of the regression coefficients are zero, i.e., the hypothesis that y is 
independent of the fixed variables x q+1 , ■ • , x r , is completely unbiased, 

Wilks [15] has noted that the ordinary analysis of variance and covariance 
amounts essentially to testing hypotheses of this nature by means of the function 



Consequently such tests are also completely unbiased, since the region of rejec¬ 
tion is then taken to be of the form f > f £ . 

3 An inequality relating to incomplete Beta functions. Let us write 

B (u, v] t) = f z u-I ( 1 - zf^de (0 < t < 1). 

Now, 

(V 1 (1 - zYdz = gU(1 ~ 2) " T + - T 2“(1 - zy-'dz. 

U J( U Jt 

The integrated term on the right is non-positive, so that 

(3.1) B {u, v + 1; t) < - B(w + 1, v, t) 

u 

in which the equality holds if and only if t = 0 or t = 1. Again, since 
Z u { 1 - zf- 1 + Z U_1 (1 - z) V rn Z U ~\l - z) V -\ 

we have 

(3.2) B(u -f- 1, v, t) + B (u, v + 1; <) = B(u, v ; t). 

Combining these results, we find that 

(3 3 ) u _ + _v ^ ^ ^ j,. 

u 

with equality only when t = 0 or t = 1. Hence we have 
Lemma 1: I/O < i < 1, then 

B {u + 1, v; t) B(u, v; t ) 

B(u + 1, v) B(u,v) 

4. The multiple correlation coefficient. Suppose the distribution law of the 
underlying population is known to be of the form 

(4.1) f(x\ - •• , | ■ • • , X m ) = 

The indices appearing in this expression take the values i, j = 1, . . . , t and 
V> <7 = f + 1) • • • i m. The summation convention of repeated indices will be 
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in 

used, for e xam ple, £ C' v x p will be denoted by C' p x F . Wc shall also have ocea- 

sion to use indices r, s with the range r, a = 1, • ■ ■, m. The set of possible values 
of the a’s, B’s, and C ’s is 

Q: || B,j || positive definite; — » <a’< — w < C' p < ». 


We shall consider the X test for the hypothesis H that a ; 1 is independent of the 
remaining variables x 2 , • ■ , x m , i.e., that tho parameters belong to that subset of 
Q defined by 

w. Bik — 0, (fc = 2 , • • ■ , f); C T = 0. 


Let us write v r ‘ = 2 (%a — x r )(x‘ a — x), and assume that the values of the 

fixed variables x v a have been so selected that the matrix || v v " || is positive defi¬ 
nite. The likelihood ratio can then be expressed in the form 



= (i - rT, 


where % is the complement of v 11 in the determinant | v r ’ \ . If N > m + 1, 
the general sampling distribution of ft 2 (the multiple correlation coefficient 
between x 1 and m — 1 other variates), for this case in which x 2 , . ■ , x 1 are sub¬ 
ject to sampling variation and the remainder are fixed, is 


_ (1 - 

(4 2 ) mN-m)] 

x v v Wm - p 2 )V)W + 'rW — 1) + m + v] ^ 

U zi - l) + M ]r[Hm - l) + m + v] a{K} * 

where 


i 2 

1 - p = 


IB., 


BuB 


,11 > 


iv’-jp C\c\, 


|5 ,y |l = l|B«ir l . 


This distribution was first obtained by Wilks [13], although Fisher [3] had 
previously treated the two extreme cases in which (1) all independent variables 
are subject to sampling fluctuation, and (2) all independent variables are fixed. 
To simplify the presentation, let us put p = P 2 ,y = iy 2 and It = ft 2 , and note 

R- nn. - and ° nly lf = 0 <P = t + 1, ■ • ,m) while p = 0 if and only if 
n 0{k -2, ■ • , <), so that y = p = 0 means that the hypothesis II ia true. 
On any alternative hypothesis, one or the other or both of these quantities will 
be positive. Let the region of rej ection be taken to be 


ft, < R < 1, 


which corresponds to 


0 < A < (1 — ft,)‘ K . 
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The probability of rejecting H is then 


(4.4) 


7Y- I-. 75 \ _ V' V 1 y 11 fl -\HN-l)+lip' T[^(N — 1) + p + t>] 

J(p .&*•>-« ^rriipr^ir+'ir 


X 


i: 

• p a 


i) m)—i 


Si B[J(wi - 1) + M + it# - w)] 


dA. 


We shall show that I(p, y, R.) is a strictly monotone increasing function of p 
for each y, and that 1(0, y, R t ) is a strictly monotone increasing function of y, 
dl 

First consider — We can write (4 4) in the form 

dp 


I(p, y> R.) = 

p-o /i! i 


p! r[i(N -1) + p] Uv 1 


E -,(i - p) 1( "“ 1,+ ' 


VW J 


where 


- rriCiV - 1) + U + v] - 1) + P + p, KjV - m); i s,] 

141 } + M B[|(m - 1) + p + p,*(2V - m)] ' 


‘Pll.V 

Then, formally, 

p(± <a - p) k -"*’ 

dp\ Va . o y! 

—v —1 




= E^-a - p)‘ < jv " i)+, ‘ - E - a - p) un ~' )+,, ~ 1 ihn - i) + pW,. 

p «0 VI v-0 v\ 


Taking out the factor (1 — p) 


Kw-D+m-i 


, we have left 


E \^ - E p -; m - 1) + ri*. 


— E {<Pti,r+1 — [i(iV — 1) + p + 

V “0 V l 

A 

Ancfthe expression <p h ,,+i - [J(JV — 1) + p + v]<p M , K is the same as 

rim -1 ) + p +, + u fe- 1} + M +}±M % ~ 

l B[£(m — 1 ) + p + v + 1, i(N — m)] 

_ BtKw — 1) +_p + p, - rn) l R,\ 
B [4(to — 1) 4- p +7,T(W ~ m)]' 

and is therefore positive, by Lemma 1. Consequently 


YpIfaO, > o, 


with equality holding only if p = 1, or if the critical region is taken as the whole 
interval or the null set. 
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We have yet to investigate ~ 1(0, y, R,). 


In this ease (4.4) becomes 


(4.5) 


1(0, y, R t ) = e~ B 


f f mm - !)+/», W ~ m), R'\ 

U m! m(m ~ 1) + n, f(N - w)] 


(Note that this agrees with (2.2) if we make use of the relations r = m, q = 1, 
and B n = 2a 2 ) We then obtain 

A T / n ... -if 1 ^ 1 - 1) + a + I ) |(N — m ); A] 

dy my,Ue> hv-A B[^(m — 1) + /i + 1, m - m)l 

_ - D + M, 4(iV - m); it,]) 

B[f(m - 1) + Mj a(N — rn)] / 

which the lemma shows to be positive when 0 < R, < 1. 

This concludes the proof of 

Theorem II If the underlying population has a distribution law of the form 
(4 1), then the likelihood ratio test for the hypothesis thatx 1 zs independent of x i , • • , 
x m , where x‘ +1 , ■ ■ , x m are fixed and x 5 , ■ , x‘ arc subject to sampling variation, 
is completely unbiased. 


5 Mutual independence of several sets of random variables . 1 Let the dis¬ 
tribution law of the wi-variate population be of the form 

(5 i) 1 B *t I e -s<f(* ( -o‘K^-ob 

ir* m 

Here is the set || B t , || positive definite; — «> < a’ < ro Suppose wo wish to 
test the hypothesis Hi that the variates (a; 1 , • , x mi \ • - , {x n ”~ l+1 , ■ • ■ , x mp \ 

arc mutually independent in sets [14], where 0 = m„ < m[ < .. < m I( ’= m. 
Then the u set is that defined by 

II B 'i II - II B nn II + + II R> plp II = || Hi || 4- • • ■ + || B„ || , 

that is, we have B i} = 0 unless the indices i and j both relate to the same dft of 
variates 

Associated with the population of random samples 0# (N > m + 1 ) drawn 
from a universe characterized by (5.1), we have the distribution function 


pfeB.a). lit' 

The maximum of P with respect to variations of the parameters B %{ , a { in fi is 
elimination in '“otvention”" 8 ^ ab ° V ° b<lW indicates 
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where 


v' 1 = £ (a: 1 ,, - &){x’ a - x 1 ). 


a^l 


And the maximum when the parameters are restricted to a> is 

k i<Ym 


p “ = [ Ui --'^]- 1 ' v (e 


) 




where v h stands for the determinant of the v’s connected with the /a-th set of x’s. 
Thus the appropriate likelihood-ratio is given by 

= W 1 

1 Pi ...Up’ 

It is easy to see that the value of X 7 is unaltered if we replace x' — a' by x\ 
so that we can express the probability that Xj will he between 0 and X t in the form 


- R liV f 
I(B, XJ = ^ / 


iV 

— 2 / K*1 X a X a 

e'"” 1 dx{ 


dxl 


Furthermore, X 7 is invariant under the operation of replacing any x by a linear 
combination of z’s belonging to the. same set. And since the assumption that 
|| B„ || is positive definite implies that the matrices || || have the same 

property, we can transform the z's in each sot among themselves by orthogonal 
transformations in such a way as to reduce each of the expressions 




to sums of squares Thus we have 

N 


X 

(5.2) I(B, Xj = J e dxl • ■ • dx 1 !} = 1(B*, Xj, 


where 

(5.3) 

(5.4) 


— <x ,p-E'h fl k,<X ]y 

Btu = 0 


(hp , i,, , jn , K — Wji— i ~f* 1, • • ■, wip), 

U ^ j,, 


and the subscripts on the indices indicate the sets of values over which they 
range; e.g., i 2 runs over the numbers corresponding to the columns of the matrix 
11 B t | j. From (5 3) and (5 4) it is clear that 11 B*, 11 reduces to a diagonal 
matrix when H is true 

In order to show that the test is locally unbiased, we may consider the deriva¬ 
tives 




d 2 

ti,dB 


* 

hgkr 



(m t* v , a t) 
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(£Pj , o, (~£*-\ _. , 

' \ 5 -B.y, dBh.kJo ~ ~A v,<r ?£ T ) 

‘ he ““" d “ is *■*“ «- - ^ a. *.. Thu , 

\ /o ^ iNm L<\.^' l Xa " x ’ a ’ e a “ 1 *“dx, 

And since whenevefthfpdnt^f 8 77 diag0] J al form associated with //. 
- A< > 80 also is the point x\ !,. . ' / »" * « **••••,*£ is ia the region 

’ 1 ’ " ' > £*;•••, a# it follows that 

A 

Cm a). 


afifT > x *) = 0 » 

»g/K , ' 


Similar considerations show that tk„ 

a show that the non-repeated second derivatives 

3 2 


81 “• /(< u - 4 £ L (I •»«’) (£ **) 


must vanish. 

S;5SHH~ s “=-"“"rrT 
‘ ainsss" “ 

(5.5) 

"777 ~w 1 n 

'“-‘-‘"IlrBW-o, 8 *'•••*“. 

I 

(Because of the relation „« * 

G(B, JV - l, w ) „ 

F(fi, ^ - 1 ( W ) = 

With the aid of ( 5 . 5 ) we h |] 

J.we shall now compute the moments 

m ! y% 

h m 0, 1, ... , 
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for the case in which the matrix || B„ || has the form 


(5.6) 


Bn .. 

• -Bltni 

0 

■ ■ ■ 0B, m 

. 

. 

. 

0 

5 m ,i • ■ 


6 

... 0 

o .. 

. 0 



0 

* 


II51| 

B m i 0 .. 

. 0 




where || B [| stands for || B 2 1| + ■ • • + || B„ |j, and all other B’s, except those 
indicated, are zero. Let us designate by ( v ) the set of v' 1 which correspond to 
the rows and columns of B, and by (v - S) the remaining i/’s, We then remark 
that the result of integrating (5.5) with respect to the ris in (v - d) is to reduce 
it to the corresponding distribution for the variables in the set v, thus: ' 


(5.7) 


G(B, N-l,tn) j V(B, N - 1, m) d(v - v) 


— G(B , N — 1, m — mi)V(B, N — 1 ,m — mi) l 


where || B k i || is the inverse of the matrix obtained by inverting || B l} ||, and 
striking out the first m, rows and columns, that is 


B — B , (k, l = titi + 1, * • • , m). 


Then, 


G(B , N - 1, m) f 



V(B, N - 1, m) d(v - v) 


can be written as 


(5-8) 


1 +“. »>/<*-•;* 

XV(B,N-l+2h,m)d(v- t>) 


X V} 1 ' • •• v„ h V{B, 17 — 1 + 2 h, tn — nil). 

It can be seen from (5.6) that 

II £ ll“p, 11+ +11^,11 + 115,11 


since of all the rows and columns of |j B^ || which are involved in j| B || it is 
only the last in which a non zero element appears outside of the blocks || B t |j , 
... , ]| B P ||. Consequently, the v’s corresponding to the determinants v 2 , 
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Vp are independently distributed, so that if in (0.8) we integral0 out all the 
remaining r’s but these, we shall bo left with a product of factors 

G(B, N - 1, m) ff G(B ,, N - 1 + 2h , k,) 

G(B, N - 1 + 2h, m)' T~\ G(B t ,N - 1, h) 

X Q(B„ N - 1, k t )vT h V(B t , N- 1 + 2 h, k,) 

x Qfo,jV ~ 1 + N _ i, k v )tfV(Bp, N - 1 + 2/t, A V ), 

G(B p ,N-l,kp) 

where A* stands for the order of || 5* ||. And this, when integrated with respect 
to the v’s in Vi , ■ • , v P , yields 

Q(B, N - 1, m) ff G(B t ,N -1 + 2fc, A t ) v <?(£„ AA — 1 + 2h, k„) 
G(B , A- l + 2A,m)' fi AT - 1, A,) " “ 0(5*, JV - 1, A*)' ’ 


which, because of the definition of the G’s, reduces to 


n 


Tm - 0 + h] 
r rn - *)] 


■nn 


vim - m 

r[i(AT - i) + h] 


X B~ h B\ ... Bp. . 


Denoting the product of ratios of T’s by K >,, and recalling the form of jj B tl ||, 
we therefore have 


(5.9) 

.with 


e\ t -^1 = RhB\B'- 

_Dj • • ■ V p _ 


I B' || = 


Bn • • • Bi mi 0 ... 0Bj m 
0 


B mi 1 • ■ 
0 .. 

6 

■BmiO • • 


* Bmim\ 0 ■ ■ • 0 

. 0 

115*11 


But it is not difficult to see that under the condition (5.6), the matrix )| B p || 
is also the inverse of the matrix obtained by striking out the first m rows and 
columns in the inverse of || B' |[. Making use of this relation, wo can apply the 
Jacobi theorem to (5 9), and put that, expression in the form 


where Ij Bi || is the matrix in the upper left hand corner of || B' ||, namely 
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Let the subscript 0 oh a B stand for the result of replacing B nn by B lin + 
Pint- For sufficiently small values of the j3's the matrix || B %] p || will still be 
positive definite, so that wo shall have 


771 

■‘“’IIiW-i)] 


/ 


4 


^ = K hB -\ 


which we can put in the form 


(5.10) 


K 


i 


h 

V P 


v l(N-m)-ld V = _ El 




Wilks [13] has shown how to generate moments of determinants by the device 
of replacing f} iin by fi HU -f f n { n , and integrating with respect to the £’s from 
— oo to oo. Applying this process 2h times to the left hand side of (5 10) gives 

*"* K ' / CrvrJ 7lB >• N - 7 m) *■ 

which when multiplied by 7r "‘ lk Bi (/ ' r - 1) yields 

E[0? l *) h ] 

when the /3’s are set equal to zero. 

To obtain the value of this expression, we may perform the same operations 
on the right hand side of (5 10) But before so doing, we shall put Bp in a 
more convenient form. We have 


Bp = B v . B - B\ m .M mm B$, 

where B is the inverse element of B mm in ]] B || , and B (J 1 is the cofactor of 
B n p in Bip , the result being obtained by expanding Bp according to minors of 
the first row and first column. Similarly, 


(5.11) 

From (5.11) we have 


B = B V B-Bl m ■BB mm .B’i n . 



n2 5mm -*^1 
ni m n 


Bi 


} 


so that if we put B ■ B x 1 •. B~ l = A, we find that 


B P = Bip-Bll - 

l Bi Bip, 
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Thus the result of multiplying (5 10) through by B Hn 11 (where no 0’s are 
substituted in this determinant) can be put in the form 


(5.12) 


/ B Y " d 
\B t ■ ■ • Bp) 


1~ an(l-A)f* 

My 


B I 


-lOv-U 


B 


13 . 


Expanding the expression in curled brackets, wc get 




v r[*(ff -!) + >■) n. 

£o V\T[W - 1)] 1 



£-[Jur-n aw 


(1 - A)'. 


If we let Bipi stand for the result of replacing B n by B u - l in B ia , wc. can write 
this as 


i 

(5.13) 


* i(ff-i) v 1 B[i(N — 1) + v] 

a drp-1)] 


(1 - AY(Bi u rBi w ~°+’ 


B{i(N — 1) + A] cT rj-lHAT-D+ii 

T[$(N - 1) + h+~v] 90- w ' 


the derivatives being evaluated at t = 0. 

Now Wilks’ results show that the operation of introducing 0,^, + into 
B J( )i to replace /3 n „ and integrating with respect to the f's, wdicn repeated 2 h 
times on i?i]« <W-1)+Al , produces 


v mih Bii (N ~ n 


TV raosr - i)} 
mN -i) + h\ 


when the p’s are finally set equal to zero. Reversing the order of summation, 
differentiation and integration in (5.13), we thus obtain 


."•i'i JJ r[j(iV — t)] a j(at-d y' r[J(AT — 1) + v] 


(5.14) 


Now 


r m - i) + h] 


7=0 virtKN - 1)1 


x (l - a yiB^rB^- 1 ^ - D 4- h ] (£ u\ 

mN-i) + h + v }\dt’* u k 



r H(N - l) + „] 

nm ~ i)] 


(B^ a yJ3T ,l(Jr - ,H ' , , 


so that (6.14) becomes 


»llk fj m N - i)] 


.-t mN -i) + hy 


il(JM) 


f rtKA r -1) ] 

t=o r!r(KA r - if] 


x (i — Ai 1 ' r li(N — i) + h] r[&(N — i) + r] 

m{N - i) + h + r[¥(iV — l)] 
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From this it appears that the 7i-th moment of Xi /,v is given by 
rriYNWAVo _ TT r[KA r — *) + h] -fi Yi — i)] 

l(x } 1 " ii rp - ol 'fi fi r»(JV -») + ft] 

r[*(tf - l) + v] 

__ U — 

v_o 


(5.15) 


X A i( * 11 Z (1 ~ A) 


X 


rir[|(JV - 1)] 

r[$(iv - i) + v] r[i(N - l) + ft] 


r[*(2V - l) + ft + f] r[KlV - l)] 


A considerable amount of cancellation will take place m (5.15), for m is greater 
than any k t . Suppose the largest k t is k t ,. Then we can cancel its product 
into the first one, with the assurance that there will be at least one factor 


(5 16) 


nm - 1 )] 

TIW - 1) + ft] 


to cancel the corresponding factor under the summation sign. Hence we have 


(5.17) 


®[(X 2/V ) A ] 


tt r[|(fv — i) + h] -/r, jj„ r[$(N — i)] 
Jtf+i m(N - »)] ,-i T[h(N - i) + ft] 


v y> /i _ .y r[|(A r —!)+>'] rIKAf — 1) + v] 

* ZA u J v\v[h(N-i)] 'r[*(JV - l) + h + „]' 


where II' indicates that t' has been omitted, and II" indicates that one factor 
(5 16) has been cancelled. Then we can take out the factor i = m in the first 
product, putting it under the summation sign, where, together with the final 
factor in each term of the sum, it gives rise to the combination 


rfoov - l) + r] r[i(w - to) + ft]rfo(m - l) + ?] 
r[i(iV - m)]T[\(jn - 1) + v]‘ r[4(iV - 1) + h + y] 


After making this reduction, we obtain 


(5.18) 


jE[(X 2 'T] 


TT r m -l)+h]fr,ft, Tim - t)] 

aUl T[$(N - i )] bk r [UN - ») + ft] 


V li(iv-i) V fi a v r[KfV — 1) + v] B[§(iV - m) + h, \{m - 1) + v] 
t=l K ’ Firft(2V - 1)] B lh(N ~ to), Um ~ 1) + V] 


The products of ratios in the first part of (5.18) are of the type discussed by 
Wilks in connection with integral equations of type B [12]. It follows from his 
results that X 2/ * is distributed like the product 


z-8 i ■■ ■ 8 m > (to' = to — k t > - 1), 


where z and the 0’s are independently distributed, with the distribution of the 
0’s given by 


M, 


) = n 


r(c.) 


i-i r(6<)r(c, _ b,) 


• 0» <— *(i - 0i) o<_J,_1 , 
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where the l l and c v are constants which depend on N, m, and the .sizes of the 
blocks, bat not on A, and the distribution of z is given by 


F{z) 




E (1 - A)' 


F—0 


’[J(tV - 1) + pj z 1<w_m, “ 1 ( 1 - «+»-1 

r!r[-K N - 1)) B[i(iV - m)Yl(m - 1) + „]' 


Consequently, the probability that X lies between zero and X, is 


i(<V-m)—1/q _ 

X m Bp- »0, l(m - 1) + v\ dz d0 ' 


where the integral is to be extended over the region 


8- 0 < *.«, ■■ e m , < \ 2 J», 0 < 0, < 1, 0 < z < 1. 

Let us integrate first with respect to z and then with respect to the 0’s; we have 


(5.19) 


t(a, x«) — f 


»"0 rir[$(iV — 1)] 


B'tK^ - m), Um ~_1) -f v ; <p] 
B[|(1V - m), K m — 1) + v) 


M do, 


where St is the set 110. < \\' N , 0 < 0, < 1, and 


(5.20) 


B'(u, v, v) = I" z u ~\ 1 -*)» 
Jq 


l dz 


- f z" J (1 — z)”~ l dz = B(y, u, 1 — ip), 

J 1 —ip 


¥>(0) being the upper limit for z for fixed 0. It is clear that the subset of s s for 
which <p(6) < I will not be of measuie zero in the 0-space, since we assume that 
0 < A c C 1. 

The relation between (5 19) and the corresponding expression for the multiple 
correlation coefficient without fixed vaiiatcs- - the case y — 0 in (4.4)—mav be 
clearer if wc put ’ J 


(5.21) 


P = 1 - A = 


where F"" is the inverse of B mm in || B ||, and B\ l is the inverse of B n in II B, 
hen the required probability of rejection when p has any fixed value is 

i(- P , i - \\'«) = [ £ t (i _ =)«»-« r ft(y - l) + v] 

Jss »=0 v\ 


mw - 1)] 

B(Km - 1) + v, UN 


m), 1 - <p) 


B[j(m — 1) + v, — m] 


'M do, 


where we have used the relation (5.20) between the incomplete Beta functions 
Differentiating with respect to p before performing the 



likelihood-ratio tests for independence 


1 1 

to the 6'a, wo find by a computation similar to that m section 4 that each term 
in the senes is positive except where < p(8) = 1; so that we have 

~(- P , i-xD>o (x.^1,0). 

d P 


And by (5.21), we then have 


3 *J 

bbL 


> o. 


Since the argument is clearly independent of whicli (p 9^ v) we take, it 
follows that the test is locally unbiased We have therefore proved: 

Theorem III. If x 1 , ■ • , x m have the joint normal distribution (5.1), then the 
likelihood ratio test for the hypothesis that the x's are independent m sets is locally 
unbiased 

In certain types of statistical material it may be important to consider, not 
the independence of the z’s themselves, but of their deviations from regression 
functions For example, m the case of several related time scries, it may bo 
desirable to eliminate the trend of each %' by means of, say, a second degree 
polynomial in t. Consider then in general a population whose distribution func¬ 
tion is of the form 




(p, v = m + 1, ■ • ■, m + q) 


with unknown B lj and Cl The likelihood ratio for testing the hypothesis IIi 
that the sets of deviations 

x 1 - cy, . . . ,> - (7“*x"; • • • ; x ra '- i+l - c;v 

are independent is 


x, = /jrr»» 

\dx • • • a : 


where 


d i] = s(xi - cy a )(xi - cw a ) 

and Cl is the usual least squares estimate of Cj ,, given by 

cy = a* 

with 


a" = VxWa (r, s = 1, • • • , m + g). 

An examination of the characteristic function of the d i! shows that their 
distribution law is the same as that of the v 13 of the preceding discussion, except 
for the fact that N — 1 is replaced by N — q. Consequently the above results 
on freedom from bias, and also those of the next section, apply equally well to 
the X/ test for the independence of deviations from regression functions 
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6. On th6 Moments of X? /jV . Although wo have succeeded in proving the un¬ 
biased nature of the preceding test only in the local sense, we can show that the 
moments of the criterion \) IN have a property which seems very closely related to 
that of furnishing a completely unbiased test. For it can be shown that each 
of the quantities 

f[(x 2/ Y] h = i i, it, • ■ ■ 

is greater, when Hi is true than when any alternative H' holds. It will perhaps 
be sufficient to prove this statement in detail for the case where h => 1 and 
whereF/is the hypothesis that the matrix || || has the form || || -j- II^iWjII : 


Bn Bn 

0 

0 

B 21 Bn 

Bs&Bn 


0 

0 


Ba Bn 


0 

0 

11*,/. || 


in the notation of the preceding section we then have 

= ij , jz = 3,4; i» ,j» = 5, • • ■ , m. 

Even when H is not true we find that 

("ft 1 ) wn a' 1 1* Im’ 111 r*l = N — 1, m) 0(8, N — 1 + 2h> m — 4) 

' m N-l + 2h, m y " G(B, N - 1, m -~4).’ 

where B' ,H — B al \ Using the definition of the G’s in section 5 and the Jacobi 
theorem, we can write (6.1) in the form 

E[\ v ' 1 f |>* n = K h B~ h 

where B is the determinant of the matrix composed of the first four rows and 
columns of || B,j ||. In the general case we therefore have 


Bn 

Bn 

Bn 

B u 

Bn 

Bn 

Bn 

Bn 

Bn 

Bn 

Bn 

Bn 

B a 

Bq 

Ba 

Bu 


Sa V B T 7* B "s T B “•• + + f “ £ ” 

. ,,#1 , ,* t (j 3 respectively, mdicatmg this replacement by a 

prime, we obtain J 


dt. 


( 6 . 2 ) 


^[(X 2 '") 1 ] = Kif 
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Treating B' as a bordered determinant, we can reduce it to 
b ' = 5(i 2 8) (i + 

= 5 (u) (l + + B'dlbtftf) 

= S w ( 1 + 

= 5(1 + 1 + 5‘ ( {}»t« , f« ) )(l + 5{i&<?£, ( ’>)(l + SliUtW, 


where the subscripts on the 5’s indicate the sets of £'s still contained in the 
determinants, and || 5 tJ || = ||5 t ,-|[ -1 . Similarly, 

(6.4) 5' 


= 5(i + 

the inverse now being taken with respect to || 5 ||. 

But between, say, 5(ij) and B(U?, there is the relation 


/C K\ 6*2^2 E*2J2 D»2<3 D D7JJ2 

1,0.o; -0(12) — -O (12) — o (12)13(12) tjjj-D (12) , 

where || 5( 12 ),,„ || = || 5(}j' || _1 , that is, the inverse of the matrix obtained by 
deleting the first four rows and columns of || B\u) ||. Consequently 


with equality holding only for those values of the £’s for which 

1^5$; = 0 1, = 5, • • • , m. 


And this set of £’s will not make up the entire £ space unless || 5,-, || = 
|| 5 || + || B, 3U ||. Applying the same kind of reasoning to the other quad¬ 
ratic forms in (6 4), we can therefore show that 

l 5 1( 


< 5- 1 / (l + 5 ,I, ' 1 ^ ) $£ ) r‘ (y+,) •••(! + d£. 


The last form can be reduced to a sum of squares with unit coefficients by‘ a 
linear transformation of the £ (<) ’s; thus 

f £ j ( W -l) £/ -J<W-l,£,-l^ 

( 6 . 6 ) 

< £ _1 /1 &\ith ra+ s iih $ tftr* 1 *- 0 • •. (i+s£ii } £i; , )^ (w+i) d£. 

And by making use of the fact that 

5(128) = 5(128) • | 5(u) <i>, |, 
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we can express the right-hand side of (6.6) as 

s-'/r'd+a'”'i“>O' 1 "*’ - a+<*. 

This in turn becomes [c. f (6.4)] 

b-\ j i B mnn r‘(i+ B' in ^ tfr iN (i+sar $ 

= /1 B m , m r x (i + 

X (1+ )-*"(! + dl 


At this stage'we can write 

I B(i»nu I = I B hn |(1 + $>), 

where ||.Buj m || = || jB<i)uA IP^'a-nd apply the relation 

Ba { r = B\\] 1 - BwB Win B\l h , \\& mtt , t II = II B'dl' ir 1 . 

Therefore, , 

dSpim < 8!i!‘(?«!!’, 

unless fil’fia;* = 0 (i 2 = 3,’4). We can thus continue as follows 

J b!W-P b ,-J(w-i) j$,-i 

< I Am I" f (l + {<Y)~ i<W+l) 

X (1 + d£ t 

Transforming the £ (2) ’s, we get 

I-Bun r f \B*i) ul |“»a + B*"’'! ( { ?li ( »): i(N+1 \l + S^ , ^ ) )- Kw+1) 

' .n X U +2*K ) r u, (l + S^^)- i(w+1) df 

Since | B$ n | 1 = | |, this becomes 

l B un r / (i + + 2 sM’) H( w+j) 

- x a+2#> + s z\ntfr il " w dj 

= / (1+ 2{<l ) ^ ) )- lJr (l +2€g ) ^ , )-» <w+0 

d{. 


. X (1 + 2^ , ^>)-**(l + 
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Collecting these results, we finally obtain 

Ki J B'' 1 

(6 ' 7) < Ki f (1 + + 2 fS’ i,™) _i( " +U 

with equality only in case Hi is true. But the right side of (6.7) is the first 
moment of \) ,N computed under the hypothesis II r , while the left side gives 
the corresponding moment in the general case 
The possibility of carrying out this reduction for the case in which the matrix 
|| jB || has more than two blocks, or blocks of unequal size, seems sufficiently 
clear. And to obtain higher moments, we have only to introduce the proper 
number of £’s into each set. We then have: 

Theorem Ilia. Let be the likelihood ratio appropriate to testing the hypothesis 
Hi that the normally distributed variates x 1 , ■ , x m fall into the mutually inde¬ 
pendent sets x\ ,x mi , ■ ; x mp ~ 1+1 , , x m Then the expected value of 

(Xr /Ar ) A , h = 5, 1, If, • • , is greater under the null hypothesis Hi than under any 
alternative hypothesis m Q. 


7. The general regression problem. Lot the vauaLos x 1 , 
tributed according to the law 


, x‘ he dis- 


(7.1) 


B ti 1 * 


Throughout this section, let the ranges of the indices be 

h J = 1) • ,t v, q = t + 1, • • • , m 

r, 8 = 1, • • ,m r', s' = 1, ■ ■ ■ , t + q 

y, v = t + 1 , , t + q <r, r = t + q + 1 , • • • , m. 

In (7.1) we therefore have t random variates, and m — t fixed variates. Con¬ 
sider the hypothesis H that the x ’ are independent of the last set of x’s, namely 
x . We have 

£2: |! B t j || positive definite, — oo < Q' v < 
while for co we impose the additional requirement 

Cl = 0. 

Thus in general we have for the distribution of random samples 0 n , N > m, 


(7.2) 


——— e 


t\ ni 



22 


JOSEPH F. DALY 


while when H is true, we have 


(7.3) 


P = 






Differentiating (7.2) with respect to the B's and C’s and setting the derivatives 
equal to zero gives us the conditions 

(7.4) E CpXaXa = E 

a —1 a - ! 

(7.5) B' 1 = 4 t (*‘« - C' r x*)(xL - C{xl). 

Cf“l 


As in section 2, we put 

N 

a r ‘ = E xWa. 

a-1 

and assume that the fixed values %l have been so chosen that || a M || is positive 
definite. Then (7,4) and (7.5) can be combined to give 


\ 

D»i _ ^ j n <! n' r n' n Q h — ^ 

“ ~ ( fl 0 ) — Jj a 1 

where || a' p , ||~ J 

= || o” ||. It'then follows that 


Similarly, 

P -- 

where 

dY = a <} - aX,a’ : , || a^. Ip 1 = Ho"’ 


The matrix || a 1 will be positive definite except for a set of probability zero, 
so that we can consider a 3 j as the inverse of the matrix obtained by removing 
the last m - t rows and columns of the inverse of || a” ||, and || dl s || as the 
inverse of the matrix obtained by removing the last q rows and columns of 
11 ° r || *■ Then by the Jacobi theorem 



i^r 



so that the appropriate likelihood ratio iB given by 
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(7.6) 


It will be advantageous to complete the matrix || B,j || in (7.1) by defining 

d _ __ d ni 

-LJ ip ' -D tj'-Jp) 

B pq = CpBtjCi. 

(Evidently B lP = 0 for i = 1, •. ■ , t and fixed p, if and only if C p = 0, 
j = 1, • ■ , t). We can now write (7,2) as 

(7 7) P(x B) = 1** A 

We next notice that X is invariant under the transformations 


X' -> a)x\ % -+ /s;®’ 


so that if we put 


7(5, X t ) = J P(x, B) dx\ ■■ ■ dxif, 


where the integral is extended over the region 

S: 0 < X < X,, 

it turns out that 

I(B, X.) B /(P*, X,), 

provided 

P* = OilB k ia\, P,* = a k { B k)1 , B*„ ~ OttBkrfil. 

To prove the locally unbiased character of the test, we may therefore consider 
the derivatives 


ox> 1<y diJtvoiJjT 

and assume that || P* || and || aT || are in diagonal form We also observe 
that X is unaltered by the transformation 


s’ -> x 1 + B' k B^. 


We therefore have 


r . , IS* I’* f ~ J 

J(B > X *) = L ^~i/ ^ 


a 

dBt 


/(Bo, X.) 


-2 


P*Q P 

ir iAr< 


I 


5 a«l 


“ 1 dx, 


Thus, 
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which is easily seen to be zero. Again, consider a noil-repeated second partial 
derivative, say 



This plainly vanishes if k ^ l, but it is by no means easy to see what happens 
when k = l, even w hen j rf. Let us therefore study the distribution law of 
X 2/v for the case, 

-B ,„ = 0, i 7 * 1. 

(We shall not, however, assume that the transformation B — » B* has been 
made on the B’s) 

Define 

B n = B„ - B r ,B'*B n , 
a 1 = a° T - a r \ v a n , 

where ||a F || now stands for the inverse of |[ a'"’ || These expressions will 
arise when we adapt Wilks’ method of moment generating operators [13], based 
on the identity 


(7.8) J edx\... dri, = exp (-B Pi a p *) 


to the problem We shall understand from now on that B = \ B t , | and 

I' ^ ~ II II ■ Let us rearrange the form in the exponential on the 

right, thus. 

= (vr + 2 B.y + B„a° J - 2 B^’B^cT 


- B^B^apfiT) - BciB' 3 BirST 

= Q - B, t B'’B, r a^ 

= Q — B' 3 y,j. 


'«>, ana a 


A subscript 0 will denote the result of replacing B r ,., by + fl r ,, 

MW thT'I m ?+ Ca f te * + hat ! ach /3 :'" has been replaced b y ^ + €"€.'• r Consider 

mente ha^ '‘ 8ht “ (78) ,tter thBS,! 
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Let us integrate first with respect to the C . Wilks has shown how to write 
Qp in the form 

<2,3 = — Qip + B pg pa vg + — 2Bp,pBf)' , a 7>v !; 1 t; v , 

tip 

where 

Qtp = B^B^B^aT + 2 B^B^B^aT + B^Bp' 3 B, r a" l a f ,a’ T . 


This latter expression is thus free of the . Consequently, 


/^'Mf)' Vrv 


-i Jt e d' ls +Q' n f 


where 


Qv = ^aM P B P >pB , p' 1 Q{a'’'B qk pBp kl b), 

tip 


which can be written 

{BrtBFBnB'f 1 te,aT + 2 B&B'?B A B? 1 tfiaT 

Bp 

+ B n Bj,'’B Th B' M te,a r *ap, a") 

The method of reduction used by Wilks can now be applied to Q[p and Qip , 
and gives 

Q[p + Qip = B pi pByB„pcr + 2B, l pB'p 1 B 1 ,<r + BnByBfrcrOpvr, 
an expression which does not involve the £’s. Thus 

(710) f e~ Q e = 7T l3 1 cT |“* BJ iQ e~ Qf ■ Bp. 


Now the quantity 


_ v (.V'X’Y 


Bp 


where B 13 stands for the cofactor of B tl in || B tl ||, can bo expressed in terms 
of Bp , provided we use our assumption that B i<r = 0, i ^ 1, whereupon tJuBp’ 
reduces to the single term yBp 1 , In fact, we have 

E[gp\ a r ‘ \ h ] = K ]J +(N - m + t + 1 - *, 2ft) | a m \ \' NI Bp {iN+l ' ) 

fl 

X exp (—Bpgpa™) = Ki[+.ir im | a vq \ h BJ h e~^ £ Mil" B -M*-i)+h+r] 

•-1 p-0 V! 


(7.11) 
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w berei following the notation used by Wilks [13], 

Bf = K = T~ iNt B iN exp (-5 M a p9 ), 


H a > b ) 


r»(o + 6)1 
rftal 


And (7.H) can be written as 


(7.12) 


*“»1 


T[W -q) + h) a' 
U »l r[i( N - q) + h + »] du* ’ 


v y rutw - g; -r m d / r -hw--«>+*k 

X 2j I wri /ht ^ i 1. ' ~l 1)1 


. fj v stands for the result of replacing Bn by Bn — u. Changing fir',' into 
wl I tand integrating, we then find that by virtue of (7.10) 

fir'i' ' ’ 


EM a" |"| a r '“ T‘] = Kr iN> II ^-|o M j*| aT r | -*BpB~** 

0^ s/ . -ktt/ T[$(N ~ ?) + h] 0' f R /~U(/f-«HM 

x T rm~-w+r+p]w J Su 




w—0 


Now 

J B'fu lUN ~ t)+X] dii = 3pu t<w '~ 1 “ 5)+ * 1 ir 11 - q + 2h + 1 ~ i, -1), 


0 o that (7-13) becomes 

m i«" i* i« rV hw-m. + t + 1-i^h) 

(7 .14), . X II HN - ? + 2A + 1 - i, -1)| o pf |* 1 a v 


V Rr»«»-«d V y" - ?) + h) a’ 

* ' h id fm -q) + h + y)W { <3u }l 

Comparing (7.14) with (7.12), and making use of the fact that 


H a > “ 1)^(1 — 1 , - 1 ) • - • Ha - 2h + 1 , - 1 ) . Ha, “ 2 h), 

we thus have 

E\g?\a r ‘ IV * I A 1 = &r ,NI n HN - m + t + 1 - i, 2k) 


t 

xUHN-q + lh + l-i, - 2 h)\a r " |* | a'" \~ h Bj k f ,)fl 


v V j -q)+h] a 

Sip !mN - q) + h + „] *r 1— ■ 
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Setting the /3’s equal to zero, performing the differentiation, and recalling the 
definitions of K and Qa , we then find 

E [(\ 2l!f ) h ] = ri i'iN - m + t + 1 - i, 2h) ri f(IV - q + 2h + 1 - i, -2h ) 
(7.16) 

^ _„ fl u (yB u Y T[i(N ~q)+ h] T &(N - q) + v] 
i=L v! T[i(N ~q) + h+v] nm - q)} ' 

Taking the first factor from each product, we can convert (7.15) into 

t t 

n MN - m + t + 1 - i, 2 h) n * (N - q + 2 h + 1 - i, -2 h) 

t-a t«2 

^ -KBU V (vB n Y rft(ZV - m + 0 + h] Tim - ff) + v] 

^ v! r[l(^-w + 0] 'nm-q) + h + vY 

This last product of ratios of P’s is equivalent to 

_ T\m_ — g) + »3 _ r[§(m -t — q)+ v]Tim — m + t) + h] 

r[|(-W - m + OMKw -t — q) + v] r[^(iV r — q) + h + v] 

Thus the moments of \ llN are connected with an integral equation of type B 
[12] and \ 2lN is distributed like the product 

z-0 2 ••• e, o < z < 1, 0 < 0, < 1, 

where the joint distribution of the 0’s is 

m 

_ TT _ r[KA r — q + 1 — f)] _ nt (AT—m-H+l—<)—1 /i _ \4(m-(-«>-l 

Srp-m + i+l-Mm-(- ? )] 1 1 ° 

and z is distributed independently of the 0’s with the distribution 
17 161 F(z) = T gl(y-'"+o—1 (1 _ g)i ( m -i- 8 ) + ,-i 

v-o v! B [%{N — m + t), i(m — t — g) + v]‘ 

The probability that 0 < X < X, is therefore 


I(y, X.) = J f(S)F(z) dz dd 2 ... d0 t , 


where £ is the region 0 < 02 • ■ • 0(, z < X< /,v . Putting <p(0) for the upper limit 
of 2 in £ for fixed 0, and St for the projection of S into the 0 space, we then have 

Hy, Xo) = //(0) e ^ u £ / mm - ir7nr~i —nn* *• 

•>a, [ »-o v\ J o B[f(w — m •+■ 1), \(m -r t — q) + v] J 

If we replace z by (1 — z) we then find 


Kv, xo) = [ m 

Jsa 


vB u yv (yJ u ) ll B[^(TO - t - g) + v, - m -j- 0; 1 — 


»-o vl B[i(m — t — g) + v, — m -f 


H)1 J 
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As far as y is concerned, (7.17) is essentially the same as (2.8). The computa¬ 
tion winch was made there, together with the type of reasoning employe d in 
the latter part of section 5 m connection with the independence test, for several ' 
blocks, then shows that 

— /(?/, X,) > 0 (0 < e < 1). 

dy 

Remembering that 

y = if’BfiBji, 


we see that 



= 2a", 

dB fl dB H 


and we remark that the assumed positive definiteness of |j 
|| r||. Hence the relation 


a w || implies that of 



together with the fact that we could have obtained the analogue of (7.17) 
under the assumption 

B it = 0 i ^ i Q , 


where io is any fixed number in the set 1, • • • , t, shows that the* matrix of 
second partial derivatives is positive definite when H is true. 

Thus we have 

Theobem IV. Let z\ ■ • ■ , x‘ be normally distributed about means which are 
linear functions of certain fixed variates x t+ \ • • , x n . Then the likelihood ratio 
test for the hypothesis that the distribution of x 1 , ■ ■ , x l depends only on a selected 
subset x‘ +1 , • ■ , x‘ +s of the fixed variates is locally unbiased, 

The result of this section has its most immediate application to those problems 
in the analysis of variance which require simultaneous consideration of several 
interrelated dependent variables x l , • • , x l in conjunction with a given set of 
independent variables x m , , x m [15], For the usual hypothesis to be tested 
in this case is that x l , ■ • , x‘ are jointly independent of, say, x i+1+1 , ... , x m . 

To return to the general case of (7.1), the method of this section can also be 
used to test the hypothesis that the regression coefficients referring to the x" 
have particular values, say 

C " = * • • i = t -f q + 1, • • • , m, 

the remaining C’s and the B’s being left unspecified Since we have 

*’ ~ - cy = - cy - (ci - w - cw, 
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by the device of replacing x' a by x' a — Clc,z’ a , we can reduce this problem to 
that of testing the hypothesis that 

c 1 : = ci- ci o = o. 


Similarly, the problem of testing whether the linear functions u', ~ alC' T have 
specified values ulo comes under the same heading [7], 

A particularly interesting case of the general regression problem is that in 
which m = t -f- q + 1, so that the null hypothesis H states that the chance 
variables x 1 are independent of the fixed variate x m , though they may depend 
upon x l+1 , , In this ease we are able to find the exact distribution 

law of \ 2IN without assuming that any of the regression coefficients C' are zero 
For the quantity 


(7.18) 


£ 


p=0 


(vvB 13 )' 

v' 


[§ (N— 


which would have occurred in (7.11) had it not been for the restriction B xa = 0 
(i ^ 1), can now be expressed in terms of Bp even without this restriction. 
By definition 

VuB*’ = a mm B"B mx B mi 

and the vanishing of the B mt is equivalent to the vanishing of the regression 
coefficients C' m associated with % m . And since 

| B X] - ua mm B mi B m] | = B - ua mm B i] B mx B m ,, 

we can write (7.18) in the form 

T i rftW - g) + ft] £ 

v! nm -q)+h+v]du > 1 Jl '"°’ 

where 

11 B fiu 11 = | \B l]P - ua mm B mi B m , || 

is positive definite provided u is sufficiently small Thus the moments of \ llN 
can be found from (7.15) if we put a mm B' ] B mx B mi = y^B* in place of yB n . 
Moreover, it can be seen that when the value m = t + q -(- 1 is substituted 
into (7.15), that expression reduces to 


F[(X 2/w ) , ‘l = V y — m + 1) -f~ h, \(yn — q — 1) -|- r] 

I B[J(iV — m -j- 1), ~ q — 1) + j/] 


>>-0 


so that X' is distributed like w, where 

(7 19) f(w) = “- 4!^ 4- 


.-0 v\ B [$(N - m + 1), i(m - q - 1) + v] ’ 
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The distribution law of X 2/,v for this case is thus closely related to that obtained 
in the treatment of the regression problem with one dependent variate in 
section 1 2. Applying the argument used there, we can obtain: 

Theokem IVa. The likelihood ratio test for the hypothesis that in a population 
of the type (7.1) the variates x' are independent of x m — the case m =* l + q -f- J 
of Theorem IV — is completely unbiased. 

‘ If we specialize the problem somewhat further, considering the ease q =--■ 0, 
= 1 (so that m = t -f 1), we find that the likelihood ratio takes the form 

>i/i? _ 1 _ 1 

1 + NvjW 1 + T’ 

N 

where v <3 = £ (x'a — x) (xi — x 1 ), and T is Hotelling’s generalization 151 

of Student’s ratio. In this case we are testing the hypothesis that the x 1 are 
distributed with zero means. The exact distribution law of 



was recently published by P. L. Hsu [6], who obtained it in a very elegant 
fashion by means of the Laplace transform. He has also shown that the re¬ 
sulting test is most powerful in the sense that, of all critical regions S for which 

P[x C S) = « + + R(b) 

(where « and o are independent of the B' 1 and of the means 6,, and R is an 

infinitesimal of at least the third order as all b { tend to zero), the critical region 
defined by 


- — —, « 

has the largest possible value of a. Tang’s tables [11] make it evident that 
t is argest possible value of a is actually positive and that the tost is in fact 
unbiased for all values of the 6’s when * = .05 or e = .01. The results of this 

“her thT, ^ tZ ‘ b “‘ ‘ hiS Pr ° Pertir “““ *° lU ProbsbiKt i r >™‘» 

mw\ appl r ti0n °[ H ° telling ’ s T is by 110 means confined to the above case, 
by Hsu r r Sr be t6Sted by mcans of thi8 ^afcistic are discussed 
bv Mill l f U . 1S now known that the Studentized D\ 'devised 
by Mahalanobis for measuring the "distance” between two normal multi- 

N 8 r P m i0n a al 1° Hotelling ’ 8 T ■ This h pointed out 
for the case in which th^t ’ Wh ° , haV6 obtained the exact distribution of D 1 
are assuTed to have Z 7 ° froffi the samples are drawn 

allowed to have different i”? matnx variances and covariances, but are 

of Hsu’s They also note th° 77*™ ’ their WOrk ’ however » is quite independent 

‘° 4he '* 4i0 
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8, Summary. The method of likelihood-ratios is of practical as well as theo¬ 
retical importance, because it provides a unified approach to the problem of 
testing statistical hypotheses. In this paper we have investigated many of the 
tests which this method yields when applied to hypotheses about sets of re¬ 
gression coefficients and covariances in normal populations. By studying the 
probability functions of the corresponding X-criteria we are able to show that 
these tests are “good,” in the sense that they are unbiased even for small samples. 

Among the completely unbiased tests which can be based on the likelihood- 
ratio method, our discussion includes, the multiple correlation coefficient, with 
or without fixed variates [13], Hotelling's generalized T test [6] and the sta¬ 
tistically equivalent “Studentized D 2 ” [1]; the ordinary analysis of variance 
and covariance for orthogonal or non-orthogonal data [11, 16], as well as related 
tests of linear hypotheses in the case of one chance variable. 

With respect to the analysis of variance for two or more variables [15] and 
certain other hypotheses regarding regression coefficients in multivariate popu¬ 
lations, though there are indications that the tests are completely unbiased, we 
have succeeded in demonstrating this property only in the local sense. 

Finally, the likelihood-ratio test for the hypothesis that the variates fall into 
certain specified mutually independent sets [14] is shown to be unbiased, at 
least locally, and has the additional property described in Theorem Ilia. 

In conclusion, much more than a word of acknowledgment is due to Professor 
S. S. Wilks of Princeton University, to whom the writer is greatly indebted for 
advice and encouragement. 
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INTRODUCTION 

An important portion of algebraic invariant theory has been that devoted to a 
certain class of invariants called seminvariants, semi-invariants, or more rarely, 
half-invariants. Of these terms, "seminvariant” seems to be the one now 
commonly accepted. The same three terms have been applied at various times 
and by various writers to a system of moment functions of importance in sta¬ 
tistical theory. The statistician using these terms has frequently done so with 
an apology for appropriating a term of the algebraist. As a portion of this 
paper we shall show that the moment functions of this system are actually 
algebraic seminvariants, and that there are other systems of moment functions 
which are equally entitled to the name seminvariant, 
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The study of the statistical seminvariants of a population leads naturally to 
consideration of the problem of obtaining from a sample unbiased estimates of 
the value of these seminvariants Estimates of this kind have been defined 
and computed by previous authors, but no simple method of obtaining the 
estimates has been given. In this paper a simple procedure for calculation is 
given and it is furthermore demonstrated that these estimates form an important 
phase of statistical semmvariant theory. 

The system of notation used for moment functions is that of H., A. Fisher, 
although the actual letters used in representing particular moment functions are 
not altogether the same as those used by Fisher. In general, a moment function 
of the population has been indicated by a Greek letter, the corresponding Bample 
moment function by the corresponding English letter and the estimate by the 
corresponding capital English letter. 

A list of references appears at the end of the paper. Each reference has been 
assigned a number and this number placed in square brackets is used in the body 
of the paper to indicate the reference. Pages of the reference are indicated by 
additional numbers inserted in the parentheses and separated from the reference 
number by a semicolon. 


I. THE RELATION OP THE ALGEBRAIC SEMINVARIANT THEORY TO THE MOMENT 
FUNCTIONS OF STATISTICS 

The purposes of this chapter are: (1) to review briefly and give adequate 
references to certain important phases of algebraic seminvariant theory, (2) to 
apply this material to the moment functions of statistics. 

1. Definitions. Any function of the coefficients of the binary form 


( 1 ) 


*-0 


/■SrUz^F', 


do 9* 0, 


which is invariant under the transformation 
(2) X = yif + Yjjj, Y = ht + dm, 


A = 


Ti 7s 
Si 5j 


* 0. 


is called an invariant of the form f. See Dickson [1; 31-36]. 
fomatiIn DCti0n ^ ^ °° efficients oi f which is invariant under the trans- 

(3) X = £ + 7 ,, Y = 

is called a seminvariant of /. 

The two operators 


( 4 ; 




th , e theoIy of * 1 * ebr “ -a »»*>- 

”"*»“*■ Mi seminvariants may b. defiwd by mean. 
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of these operators. A necessary and sufficient condition that an homogeneous 
isobaric function of the coefficients of / be an invariant is that it be annihilated 
by both fi and O. See Elliott [2; 113, 124], The necessary and sufficient 
condition that an homogeneous isobaiic function of the coefficients of / be a 
seminvariant is that it be annihilated by Q See Elliott [2; 127]. 

It should be noted that there is nothing in the definitions above which requires 
that invariants or seminvariants be integral, although usually only this type is 
discussed. In what follows we shall find it more profitable to discuss homoge¬ 
neous isobaric fractional seminvariants, the fractional quality resulting from 
the appearance of a 0 in the denominator. 


2. Complete Systems of Seminvariants. By direct application of the trans¬ 
formation (3) to / the system of seminvariants [1; 47] 


(5) 



r < n, 


is obtained. This system is a complete system, [2; 44, 205, 206], in the sense 
that all other seminvariants fractional in a 0 and of degree 0 are expiossible 
rationally and integrally in terms of this system. 

Other such systems can be defined The system of minimum degree semin¬ 
variants, the seminvariants of even weight being of degree 2 and those of odd 
weight being of degree 3, has played an important role in the algebraic seminvari¬ 
ant theory. Elliott [2, 207-209] discusses this system and gives the general 
formula for the even weight seminvariants of the system. So far as the present 
writer has been able to discover the general formula for the odd weight semin¬ 
variants has never been published, although Hammond [3] may have obtained it. 
After some lengthy but not difficult computation the result has been obtained, 
so that the last mentioned system of seminvariants is completely defined by 


n _ 1 V ( i\./2r\ o,o 2 r-, 

(6) C» +l = 2 (- l)’ +r (. 2? ' ) ~> l + 1 

,-o \i + rj % 


+ r + 1 


2 

do 


_|_ \ ^ ^ 1) t “hi | dj d j &2r — 1 

<-o \ij ao 

It is easily demonstrated that for each of the above seminvariants, and in 
fact foi any seminvariant, the sum of the numerical coefficients is zero. Dickson 
[1; 55] gives a suggestion leading to a very simple proof. 


3 The MacMahon Non-Unitary Symmetric Function Principle. Denoting 
the roots of 2 = 0 by a x , a 2 , ■ a n , the r-th power sum of 

these roots is defined by 

(7) Sr=±a:, 
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The form f may be written U(X- «.F). 

t=“l 

By a result due to MacMahon [4; 131] the seminvariants of the form / are 
identical, except for numerical factors, with those symmetric functions of the 
roots of 


( 8 ) 


t—0 


which when expressed in terms of sums of powers of these roots do not con¬ 
tain sj. MacMahon called such symmetric functions “non-unitary.” 

As a result of this theorem, MacMahon was able to discuss the seminvariants 
of a binary form of infinite order by discussing the non-unitary symmetric 

oo 

functions of the roots of 2 P* = 0. 

" 1-0 ll 


' 4 A Third Complete System of Seminvariants. By application of the result 
stated in the previous section, a third complete system of seminvariants can be 
immediately obtained. Obviously the power sums s r , r > 1, are independent 
of si. By the Waring formula, Burnside and Panton [5; 91-92], if 

Z c,Y* = co ft (1 - «,Y) 


(9) 

wherein 


Then for 


)"h) n 

irii7r 2 ! • • • 7T n 1 \Co/ \Cq/ 







»-0 l\ 


( 10 ) 


(-lrvKp - i)\h\ l h) wt .. 

— (r — l)!s r = 2_ W W Van/ 

(20*'... (»!)'* 


Placing Br (r - l)ls r the B’a form a complete system of seminvariants. 
Ihis result has some interesting statistical connections which will be men¬ 
tioned later. 


5 Linearly Independent Seminvariants. It follows from the MacMahon non- 
umtary symmetric function principle, or it can be proved easily in other ways, 
that the number of linearly independent seminvariants of a given weight r is 
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equal to the number of partitions of r which contain no unit part Furthermore 
we have at our disposal a simple method for obtaining a set of linearly inde¬ 
pendent seminvariants of any given weight 

For many purposes the power product defined by Dwyer [6; 13] is more 
useful than the customary monomial symmetric function. The power product 
is defined by the right hand member and indicated by the left hand member of 

(11) (ft ft) = Z <«?i 

• V'r 

where, for convenience, ft > ft > • • > ft • The monomial symmetric func¬ 
tion which will be denoted by M (ft * • ft) is related to the power product by 
the identity 

(12) n ! - Tv • • • ft T( ) = (qVqV ■ ■ • ft T< ), 

so that a distinction occurs only when there are repeated exponents in the 
summation of (11) 

If we desire a system of linearly independent seminvariants of weight 6, by 
the MacMahon principle we need only to compute the values of the power 
products (6), (42), (33), (222) in terms of the a’s In a somewhat different 
form these will be presented later. 

6. The Roberts Theorem. Roberts, see [2; 231] and [5; 108], demonstrated 
the existence of a duality relationship between power sums, s’s, and coefficients, 
a’s such that corresponding to any seminvariant in terms of a’s there exists 
a seminvariant in terms of s’s obtained by replacing a, by s,. The proof con¬ 
sists of showing that the annihilator for seminvariants in terms of power sums 
is identical in form with £2, a, being replaced by s,. 

As a result of this duality, each of the systems of seminvariants which have 
been obtained yields, upon replacement of a, by s,, another system of semin¬ 
variants. In particular cases it may happen that the systems are identical 
when the identities connecting the a, and s, are taken into consideration. 

We next wish to show that the systems of power sum seminvariants thus 
obtained either are identical with certain well known statistical moment func¬ 
tions or lead to new ones. 

7. Statistical Distributions Represented by Binary Forms. The fact that 
statistical distributions may be represented by polynomials has long been 
recognized by statisticians, see Thiele [7; 24-26] and Bertilsen [8], Indeed it 
was this fact which led Thiele to the definition of the seminvariants now called 
by his name. If we have given n observations ai, a 2 , • ■ • a„ , form the poly¬ 
nomial. 

F = II (Z - «,) = £ 

t—Q \^/ Oq 


( 13 ) 
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F is not a binary form, but the semmvariant theory of binary forms is applicable 
since seminvariants are functions of the differences of the roots and are inde¬ 
pendent of the X and Y, which appear merely as convenient symbols to indicate 
the various terms of the algebraic form. 

For distributions containing an infinite number of items the form F is of 
infinite order, but discussion of its seminvariants may be carried on by use of 
the MacMahon principle given in section 3. 


8. Three Systems of Statistical Seminvariants. Before exhibiting some sys¬ 
tems of statistical seminvariants it may be well to consider the meaning of 
statistical seminvariant,” for this phrase has been undefined. In fact the use 
of the phrase is merely a matter of convenience in that it emphasizes the fact 
that seminvariant moment functions have not previously been regarded as 
algebraic seminvariants. As used here a statistical seminvariant is an algebraic 

seminvariant which has some application in statistical theory. 

The system of seminvariants (5) yields by application of the Roberts’ Theorem 
the well known system of statistical seminvariants usually called central mo¬ 
ments. If n' r = ^ the general formula may bo written 


(14) 

The system of seminvariants (6) likewise leads to 
“ ir = 2§ -<> 


(15) 


k*+i = T (~1 ) ,+ Y. 2 _[ ) - 2 i- +1 ' 

,-o V + r/t + r + l Mr 


t 




+ 2J -l)‘ 

i-0 


a have been Used hy statisticians. 

The system (10) leads to the well known Thiele semmvariants 


/ \ / / / 


(16) 


k r = z il i)- V| (p- ■ ■. ^ 

••• irTl(2!)* a ... ( r |)'r 


—STi.tlS 1J5T 

Of coefficients. It does not seem that this fart h i f ° r P ° Wer sums m terms 
An equivalent way of stating this ideal to sa^hatThfSe^ 

Xr is, except for the factor - (r - 1) I the sum ! Thlele semul ™nant 

ama ° DS ° b ““ d hy »«i»8 a. moment ^ °' 

i -o i! ’ 


equal to zero. 
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It is of historical interest to note that MacMahon published his non-unitary 
function principle and the resulting set of seminvariants in 1884. Cayley [8] 
published an article in 1885 dealing with this same system Roberts’ Theorem 
having been known for some time (probably about 20 years), it seems probable 
that MacMahon and Cayley were aware of the Thiele seminvariants four to 
five years before Thiele’s definition [9] by an entirely different method. 


9. Linearly Independent Statistical Seminvariants. At the end of sectibn 5 
a method was indicated whereby a complete set of linearly independent semin¬ 
variants of a given weight r could be obtained. It has been noted previously 
that the one part symmetric function s r or (r) leads to the Thiele seminvariant X r . 
As a further illustration consider the power product (22). From a table of 
symmetric functions we find that 

(22) = — - 2 ° 8 fll 4- 

41a 0 31 a § 212 !a? 

__ 2 /U\ 3a 2 \ 

“ 41U “ + ^\)’ 


and by the Roberts’ Theorem the statistical seminvariant 

(m« ~ 4/jj fu 4- 3/ij 2 ) 


is obtained. In similar fashion a system of linearly independent seminvariants 
of weight ■A 8 have been computed and are given in Table I. For the sake of 
brevity they are expressed in terms of central moments Hence the degree, by 
which is meant the maximum degree in the j/'s, is not apparent in the table. 
This definition of degree associates with the statistical seminvariant the degree 
(in the usual sense) of the corresponding homogeneous integral seminvariant 


10. Statistical Invariants. If the transformation 
(17) x = £ + mkt], y = mi\ 

is applied to the binary form f and, if, in particular 



one system of invariants of / under this transformation is found to be 
(18) D r = A r /A\ r , r < n, 

where A r is defined in (5). By the Roberts Theorem we obtain the fact that 
the standard moment Mr/M» r is an invariant of / under this transformation. 
Thus the standard moments, or standard seminvariants in general, have also 
an algebraic connection. The effect of the transformation (17) on the roots off 
is indicated by 

x — any = £ -{■ mki) — ma,ij = £ — m(on — k)x 
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If m and k are defined as above, the result is the equivalent of measuring in 


standard units denoted by 


a, - An 

* 


The system (18) is not a system of algebraic invariants, for algebraic invariants 
must be invariant under rotation, translation and change of scale, or stretching. 
The component parts of the above system are invariant only under the last two 

TABLE I 

, Linearly Independent Semmvariants of Weight ^ 8 



8 Ma ^8“ 66^6/11 —* 70/i4 2 + 2lO/44/Ja a -{- 280/ij 9 jUs — 105^2* 

6 + lWs - 56 m w*a - 35^i» - 210^* -f UOssVa + 63(W 

5 28^^ + 49^it/ij - 3 S/n a + i20n^i‘ — 490mV« — 630/j 2 ‘ 

8 i **» - 2&wi - 66nun + l05(i» J - 420 mim» ! + 660/i«Vi + 03<W 

4 + 1W» - 56a an + 35^ - 210 m^j s + 140^jVi 

3 fit - 7i±mi + 49 MV 4i 35mi s + 105MOJ2 2 - 70/isVa 

_ 2 Ms + 28|i«aj — 56 msmj + 35>i4 J 

types of transformation In statistics translation and change of scale ordinarily 
constitute the only desired transformations so that the standard seminvariants 

Mr Ar K r 

$’ xf ' ' ' m i&ht well be called statistical invariants. 

11. Seminvariants and Invariants of Samples. Consideration of the defini¬ 
tion oi seminvariants and invariants shows that: 

from tT~ iS a seminvariant not because it is a function of deviations 
from the mean, but because it .s a function of the differences of the observations; 

the's^^T I S ^ ,nV T ant DOt beottUsc lt is a «eminvariant divided by 

to sStrrr d proper power - but beca ^*»»rat io * 

w seminvanants which are of the same order in powers of the observations. 
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These facts arc important from the statistics viewpoint because they show 
that seminvariants and invariants of samples are also semmvariants and invari¬ 
ants of the population from which the samples are drawn. 


II. ESTIMATES 

1. Power Product Seminvariants. The Roberts Theorem set up a duality 
relationship between seminvariants expressed in terms of coefficients and semin¬ 
variants in terms of power sums. It can be shown that corresponding to each 
pair thus determined there exists a third seminvariant expressed in terms of 
power products. This leads to what may be called a triple system of semin¬ 
variants, the interrelationships being most apparent when all three seminvariants 
are expressed in terms of the notation defined by (11). The seminvariant 

- — ^3 -f- becomes in this notation 
oo cio ao 

(in) _ 3(H)(1) 2 ( 1 / 

n (3) n (2> n n s 

The corresponding power sum seminvariant is 

(3) _ 3(2)(1) 2(1) 8 
n n 2 n 3 ’ 

while the power product seminvariant just mentioned is 

(3) _ 3(21) , 2(111) 

n n (2) n m 


The value of the power product notation lies in the fact that the numerical 
coefficients of the three seminvariants are then identical, 'While this is not the 
case when monomial and elementary symmetric functions are used. 

Perhaps a few remarks are in order in regard to the proof of the relationship 
above expressed. The anniliilator, corresponding to U, for seminvariants in 
terms of roots is, see [2; 230-31], 


-D=±±. 

dOti 


It is easy to see that 


nRp'W ••• P « T ’)1 1 v ( * 


*1 ^*2 
- Pa 


vV \ p. - i, 


and also that, 

(pPpP ■ ■ • pI-V 0) = (n - P + l)(pr‘ • 

nW 

Since 


pl-i l ) _ (pl 1 


vUV) 


7,(p—1) 


0 \ - )n( - P>) l — 1 = ^ t TT.Piipd'W ■ ■ ■ (p<r i ~ 1 (p< -!)•■■ (p.V‘, 

L J n p i -1 
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and 


(viY l (v>y 2 ■ • • (p.-O^'KO) = n(pi) r '(pt) Tl (p.-i ) t, ~ 1 _ (P‘) n • ‘• fo-i) r, ~ 






n*" 


it becomes evident that corresponding to any power sum seminvariant there 
exists a power product seminvariant with the same numerical coefficients. The 
converse is also true, 


2. Unbiased Estimates of Rational Integral Moment Functions. If r repre¬ 
sents a population parameter, and if t represents such a function of n observa¬ 
tions that the expected value of t is equal to r; then t is said to be an unbiased 
estimate of r. See Tsehuprow [11; 74-75], Bertilsen [8; 144], and Fisher [12]. 

Let (p!p 2 p.) denote a power product computed from a sample, the sample 

being from an infinite population. Then it is well known that 

n being the number of items in the sample. If E~ l be interpreted as 1 ’unbiased 
estimate of,” the above relation may also be written 

/, ns ra -if > 1 1 i _ (?i P» • • • V>) 

(19) E • • • Mr.J =-^-, 

and it is seen at once that the power product seminvariants defined in section 1, 
if computed from a sample of n observations, are the unbiased estimates of the 
corresponding power sum seminvariants of the infinite population from which 
the sample is drawn. 

This provides an algebraic interpretation as well as a different approach to a 
topic which has already aroused considerable interest among statisticians. In 
1927 Bertilsen [8; 144] gave the estimates of the first four Thiele seminvariants 
of the population in terms of Thiele seminvariants of the sample. In 1929 
R. A. Fisher [12] also obtained these results and gave in addition the estimates 
of the fifth 1 and sixth Thiele seminvariants. His results are in terms of sample 
moments. In 1937, P. S. Dwyer [13; 26] gave the estimates of the first five 
population central moments and indicated also means for obtaining the estimate 
of any rational integral isobaric moment function 
In the remainder of this chapter 

(1) Dwyer’s method will be extended and perhaps somewhat simplified, 

, (2) certain properties of this type of estimate will be pointed out, 

(3) estimates of all seminvariants of weight £ 8 will be made available. 


3. Computation of Estimates. From the relationship (19) it is possible to 
write down immediately in a simple, although not immediately useful, form the 
estimate of any rational integral moment function. Thus the fourth Thiele 
seminvariant is given by 

X4 = g* — 4uiMi — 3m j 2 + 12mjmi 2 — 6m( 4 , 
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so that the estimate of X 4 is 

L = (4) _ 4(31) _ 3(22) 12(211) _ 6(1111) 

4 n n (2) n (a) n (4) 

Since power products are difficult to compute directly, it is necessary to 
express the estimates in terms of power sums. Dwyer [6; 30-33] gave a com¬ 
plete discussion of the problem of expanding power products in terms of power 
sums and also gave tables of power products in terms of power sums for 
weights ^ 6. By use of (12) it is also possible to use tables giving monomial 
symmetric functions in terms of power sums. One table by J. R Roe [14; 
plate 18] includes all cases of weight ^ 10. 

By use of such a table we find 

(31) = -(4) + (3)(1), 

(22) = -(4) + (2) (2), 

(211) = 2(4) - 2(3) (1) - (2) (2) + (2)(1) 2 , 

(1111) = -6(4) + 8(3)(1) + 3(2)(2) - 6(2)(1) 2 + (1)\ 

If these results are substituted in Li above and like terms are collected, it is 
found that 

n (i) L t = n\n + 1)(4) - 4 n(n + 1)(3)(1) - 3n(n - 1)(2) 2 + 12n(2)(l) 2 - 6(1) 4 , 
a result which agrees with that given by R. A. Fisher [12] 


4. The Dwyer Double Expansion Theorem. The Dwyer double expansion 
theorem, [6; 34] and [11; 37-39], states that if any isobaric sum of power products 
of weight r indicated by 


( 20 ) 


rl 


• • ■ <fi') 


(Sil) T1 ... (ffdr'wx!... W ( ! 9j1, •*< 
be expanded in terms of power sums in a form indicated by 


( 21 ) 


r! 


2 (piO ri • • • (p.irvi!.. • x.i p *‘ (pl)T ‘' 1 ■ (p,)T '' 


then the coefficient a T of the power sum (r) is given by 

(p-l)lrl 


( 22 ) 


Or * 2 (- l )'’~ 1 


(piir ••• (p.o t w ir.i v " 

and that the coefficient a rr .., m of (ri)(rs) • • ■ (r m ) is 
(23) Orv ■•r* = driOr, ••• Or„. 




The barred product indicates a symbolic multiplication by suffixing of sub¬ 
scripts which is exemplified by 

Ojflj = (6j — 36ji + 26ui)(6j — 6u) = hi a - hsu — 36m + 5f>mi — 26um = aa- 
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The application of this theorem to the present problem eliminates the use of 
tables and permits the independent computation of the coefficient of any particu¬ 
lar products of power sums in the expansion in terms of power sums of any given ' 
estimate. The illustration given by Dwyer [13, 39, 40] exemplifies both of 
these points very well. 

5. Estimates of all Seminvariants of Weight g 8 . If the estimates of any 

complete system of seminvariants and all products of these seminvariants up 
to and including weight r are known, then the estimates of all seminvariants 
of weight g r are obtainable as a linear combination of these known estimates. 
For example, suppose that we know the estimates of all Thiele seminvariants 
of weight 2 s 5 and wish to find the estimate of /15 . Since m = + IOX3X2 , 

= M 6 = E~ l [X 6 ] + KbT^XsXJ = U + 102m. 

In table II are given the estimates of all Thiele seminvariants and all products 
of Thiele seminvariants of weight ^ 8 . From this table the expressions for L t 
and L n are obtained and, by taking the combination indicated above, it is 
seen that 

»®Af, = (n - 5n 3 + 10n a )(5) - 5(n 3 - 5»* + 10») (4) (1) 

- 10 (ft 2 - ft) (3) (2) 4- 10(ft 2 - 4n + 8)(3)(1) ! 

+ 30(n - 2)(2) 2 (1) - 10n(2)(l ) 3 + 4(1)\ 

a result which checks with that given by Dwyer [13; 27]. In similar fashion 
the estimate of any other seminvarianfi of weight £ 8 can be obtained by use 
of table II. 

6 . Computation Checks. There are a number of checks which can be applied 
to the entries in table II. These may be of interest simply as properties of the 
estimates, and they may be of use in correcting errors which may possibly have 
crept into the tables. 

When any power product of more than one part is expanded into power 
sums, the sum of the numerical coefficients of the expansion is zero To prove 
tihis we need only to consider a set of observations of which one observation is 
unity and the rest are all zero. Then any power product of two or more parts 
is necessarily zero and all power sums are equal to unity. Hence the initial 
statement of the paragraph follows immediately. 

From this fact it is apparent that the sum of the coefficients of L r is ~ , and 

n 

the sum of the coefficients of L nTl ,., Tl is zero. Thus for L 4 we have 
ft + a 2 - 4(n 2 + n) - 3(n 2 - ») + I2n - 8 1 

Tm ~- = and for L u the sum of the 

n ft 


coefficients is 




(s + «s — t«)s- 
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(2)4 9 («* - 14fl J +95ft* - 322n+420) I -3(«‘ - 16n J + 104n* - 305n + -3 (3ft 3 - 33n 3 + 128s - 168) n ( - I8n s + 125ft 1 - 384ft + 441 
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A condition satisfied by the coefficients of any aerainvariant is that their sum 
is equal to zero (See section 2). This provides another cheek on the entries of 
table II, although the seminvariant must be written in homogeneous form 
before the check is applied. Thus we may write 


U = [ (n + 1) ^ ~ + D 


(3)(1) 


(2) s 


— 3(n — 1) + I2n 


(2)(l) a 




n* 


&n 


(I) 

n* 




and the sum of coefficients is 


(« + 1) - 4(ft -f- 1) — 3(n — 1) + 12n — 8ft *= 0. 

Several checks arise from the fact (see section 6) that every seminvariant 
must be annihilated by the operator 


(24) SI' = 

i-l 03 i 

Another check results from the discussion of the next section and is so apparent 
as to need no comment. 

All the checks mentioned in this section are applicable to the estimate of any 
seminvariant. 


7. Estimates as Sums of Simple Semin variants, A seminvariant such as L t 
in which the coefficients of the m ”s are functions of n will be called a composite 
seminvariant, while a seminvariant in which the coefficients of the m'’s are 
purely numerical will be called simple. The fact that is to be established in 
this section is that every composite seminvariant is the sum of simple semin- 
variants. As an illustration consider Li . It is apparent that 



u 


n 

ftW 


U + 



where U and kt are seminvariants of the sample corresponding to X« and . 
Both It and kt are simple seminvariants. 

That a composite seminvariant may always be expressed as a sum of simple 
seminvariants can be demonstrated by considering the effect of O', (24), on a 
composite seminvariant. The* coefficients are polynomials in n and are un¬ 
affected by the operator, The expression resulting from application of the 
operator can vanish only if the coefficient of n r vanishes for every r. Thus a 
composite seminvariant which has r different powers of n appearing in its coeffi¬ 
cients is expressible as the sum of r simple seminvariants, which are not neoes- 
sarfiy distinct. Table III exhibits the estimates of Thiele seminvariants of 
weight S 6 as puma of simple seminvariants. 

Since the factors, appearing in front of each of the simple seminvariants in 
e expression resulting from breaking down a composite seminvariant, are of 
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successively lower order with respect to w, it is possible to obtain approxima¬ 
tions of various orders to the value of an estimate by using the appropriate 
portion of the expression given in the table. 

8. The Estimates of the k’s. The seminvariant K r possesses an interesting 
property which will be called invariance under estimate. By this is meant that 
the estimate of * r is k r multiplied by a suitable factor. In particular, k 2 = /i 2 and 
x 3 = /i a and it is well known that 


sr% 2 \ = 


n 


( 2 ) 1712 ’ 


= 


n 


n 


(3) 


rrii 


so that the K r certainly possesses the property for r = 2 and 3. It can be shown, 
however, that 


(25) 

From (15) 

so that 


K* = K*r, 


( 2 ) 


K tr+1 = 

7l (l 


K2 r 


lv/ 2 A i / 
= 2 


K» = ~ E ("I)’ + { 

2 i-i 


n 

By the Binet-Waring identities [15; 6-7] 

(26) (a -b) = (a) (5) - (a + b) 

and this holds for power products regardless of the values of a and b. Hence 

(2r) , 1 V , (2r\ (*)(2r - i) - (2r) 

2 ,t i K , } \i) n« 


i?2r = — + 

n 


(2r) 
n 


Since 




(*')(2 T - i) 


n 


(2) 


(Or) ^ 

the coefficient of -—- above is-^ and it follows immediately that 

n n — 1 

V _ 1 V7_iv /2r\ (f)(2r — i) _ ra 2 ^ 

Kir ~2il { 1} \i) W 

i 

This proves the first half of (25) and the second half can be proved in similar 
fashion, although with considerably more difficulty. 
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9. Other Simple Seminvariants which are Invariant under Estimate. It has 

been previously remarked (Chapter I, section 2) that the k system of semin¬ 
variants are the seminvariants of minimum degree, those of even weight being of 
second degree and those of odd weight being of third degree. The K ir ’$ are the 
only seminvariants of degree 2, but for odd weights greater than 7, there exist 
more than one seminvanant of degree 3 It is not difficult to show that these 
additional minimum degree seminvariants are also invariant under estimate 
The type of proof used could have been applied equally well to obtain the results 
of the preceding section and indicates that the property of invariance under 
estimate which is possessed by the *’s is a direct result of their minimum degree 
property. 

Consider the estimate in power product form of any seminvariant of degree 3 
and odd weight Power products of 1,2 and 3 parts will appear, By the Binet- 
Warmg identities each three part power product ( abc) yields a third degree power 
sum product (a)(6)(c) plus other products of lower degree. Since (a)(6)(c) 
comes only from (abc) its coefficient must be identical with that of (abc) and will 
therefore be a constant divided by The coefficient of each second degree 
product of power sums will be a sum of terms, the first of which comes from the 
corresponding two part power product with a coefficient identical with that of the 
power product, and the others come from the three part power products. Then 
the coefficient of a second degree product of power sums must be of the form 

_£L , eg + c„ + ... -f c, _ cin + ci 
ra® n (8> ~ n (a> • 

Similarly the coefficient of the first degree power sum term will be of the form 

din 1 + dan 4 - dt 


Since the estimate of a seminvariant is a seminvariant, it follows that d , a 0. 
This is true because the coefficient of ( ~W must be the coefficiftnt of to 

immediately 7 posribl^t^br^the^LTO 0 ’* 01 * ^ C0I ! trary be assumed is 
seminvariants" the taTbSSf^TSS TT™' 

it follows S?ny se^iv^nt degme V" 
estimate. It is also apparent that the Ltor^^^^ 

10. Composite Seminvariants which « . 

each weight r S 4 there exists □ va f iant . under Estimate. For 

under estimate. ~ For weights 4 and r;°fh^° S1 e , seminvariant which is invariant 
weights 4 and 5 this seminvariant is easily obtained by use 
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of Table III Thus for weight 4, form the seminvariant X 4 + c 22 X 2 . From the 
table we find that 


FT 1 [X 4 + C22X2] - ^ l\ + ~ fc 4 + C22 777 Z» — c 22 


n«) ’ (n - 2) w 

= ^ (Z* + C22Z2) + ^ (r 2 — n (s> C 22 )h. 

1v 


hi 


,(41 


If c 2 2 = n/n' 2) the seminvariant is invariant under estimate, 
is 


(27) 


= + ^5) ' 


This seminvariant 


In similar fashion we find for weight 5 

5« s 

(28) 1^6 = Xs + -jr. XaX2. 


For weights > 5 considerably more difficulty is encountered For weight 6, 
for example, we consider the seminvariant 

Xa + C42X4X2 + CasXs + C222X2. 

By use of table III we obtain 

7l 6 

E [X« + C42X4X2 + CsaXa + C222X2] = (U + C42Z4Z2 + C33Z3 + C222Z2) +’ 4 *, 


where 4* is a sum of .other seminvariants with coefficients which are functions of 
n and c 42 , C 38 , c J 2 2 . Now there are only four linearly independent 6 eminvanants 
of weight 6 and it is necessary that one of these involve the term (1 )“/re°. By an 
argument analogous to that of the previous section this term cannot appear in 
<t> and therefore 4> is expressible in terms of three or fewer seminvariants Ac¬ 
tually three are necessary and equating the coefficients of these to zero the values 
of C 42 , C 33 and c 2 2 j are uniquely determined. The result is somewhat lengthy 
and scarcely of sufficient interest to record here, 

The same sort of procedure can be used for determining seminvariants of 
higher order which are invariant under estimate, but the labor of computation 
becomes very great. 

It is possible to obtain moment functions which are invariant under estimate 
by means of a set of equations given by Dwyer [13; 38-39], These equations 
connect the coefficients of a general isobaric moment function and the coefficients 
of the expected value of that function. In his notation if, for example, 

fi — m(4) + 4fl8i(3)(l) 3ojj(2 ) 2 + 60211 ( 2 )(l ) 2 + onu(l) 4 , 

then 

E[ft] = biTin'i + 45ji7i (2 VaMi + 3i)2jR < 2 Vi 2 + 66 *u h* + n^bunn'i*, 





56 


PAUL Tj. DUESSETj 


wherein: 


(29) 


a 4 + 4da i + 3a 22 + 6flm + flmi — , 

an + 3flan + ami = bai , 

fl 2 J + 2fljn -f- flllU = bi2 , 

flail + Ami — &JU , 

flmi = him • 


E 


The problem at hand demands that 

' ( 4 ) , , , ( 3 )( 1 ) , 0 a ( 2) 3 , c „ 5 „ ( 2 )( 1) 2 , 4 ( l ) 4 

na i — -f 4n a 3 i - , - + 3n an —r + on am —,-r n <*uu —r . 

w n 2 n 2 « a J 

= X[wffl4nl + 4n < ' , aaipapi + 3ra 12> a52Pa ! + 67i !3) aau/isMi a + n t4) flimMi 4 ] 


so that the equations (29) become 
n 4 flnn — Xn (4) fluu, 
nVn = X»®(fl*u + Aim), 

ri an = Xn <!) (a 2 2 + 2a 2 n -f- flmi), 

n fl.n = X» (2) (a3i + 3a 2u -f flmi), 

' na 4 = Xn(a 4 + 4oai + 3fljj + 6a 2 n + flmi), 

and from these equations a,, a 31 , a !2 , aj u can be found in terms of ax UI . Ob¬ 
viously there is only one solution if none of the a's are zero. In general, for any 
weight r, a similar system of equations can be found and they determine the 
coefficients of a moment function of weight r which is invariant under estimate. 
It appears that this moment function is always a seminvariant although no 
proof of the fact has been found. The moment functions of weight 4, 5 and 6 
obtained by this method are identical with fa , fa and fa defined above. 


Conclusion. The results of this paper include: 

1. A demonstration of the fact that the theory of statistical seminvariants is 
identical with the theory of algebraic seminvariants. 

2. The introduction of new statistical seminvariants, 

3. Simplification of the computation of estimates. 

4. Proof that the estimate of any seminvariant is also a seminvariant. 

5. Proof of the existence of a trio of seminvariants with the same numerical 
coefficients 

6. A discussion of seminvariants which are invariant under estimate. 

Many thanks are due Professor P. S. Dwyer for his able guidance in the 
preparation of this paper and to Professors C. C. Craig and J. A, Nyswander for 
helpful comments. 
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THE ERRORS INVOLVED IN EVALUATING CORRELATION 
DETERMINANTS 

By Paul G, Hobl 


I Introduction. Many statistical problems require for their solution the 
evaluation of correlation determinants. The method usually employed for such 
evaluation is that of Ohio, 1 in which the order of the determinant is reduced by 
successive operations with selected pivotal elements. The repeated multiplica¬ 
tions and subtractions involved in the method necessitate rounding off the 
elements in the successively reduced determinants. The calculated value of the 
original determinant is therefore in error; and so the question naturally arises 
as to the magnitude of this error, 

Previous attempts to answer this question seem to be satisfied with finding 
an upper bound for the magnitude of the difference between the value of the 
original determinant and its value after its elements have been rounded off. 
Moreover, this bound is expressed in terms of the errors in the elements and the 
minors of the original determinant, whose values are assumed to be known h 
exactly from calculation. However, several reductions are often needed before 
the value of the determinant can be obtained; and furthermore the minors are 
subject to the same type of errors as the determinant itself. The problem, 
therefore, is to find an upper bound for the magnitude of the difference between 
the final calculated value of the determinant and the determinant itself which 
involves only calculated quantities. 

This paper treats the problem from two different points of view. In the first 
part an upper bound is obtained for the magnitude of the error. In the second 
part the first order error terms are given more detailed consideration, with the 
result that an upper probability bound is obtained for the error. 


21 Absolute Bounds. Consider the correlation determinant A = j r,,■ |. To 
evaluate A by the method of Ohio, it is convenient to select diagonal elements 
as pivots It will be assumed without loss of generality that the upper left 
diagonal element is always chosen as the pivotal element in each reduction. 
After ^each reduction, elements are rounded off to a fixed decimal accuracy. 
Let a„ represent the element i,j after the ft-th reduction, x*j the difference 
between the rounded value of element al, and erf, itself.' After k reductions, we 
arrive at the determinant 


F* = 


J , * 

flr+iM t Zfc+u+i 


lb 


+ 


k 

fin 


1 See for example, Whittaker and KobinBon Calculus of Observations, p. 71. 
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By treating F k as a function of the x k , it may be expanded by Taylor’s formula 
as follows: 


(1) F k = A k + t' ~ ££ 44, A?, 


i,j-Hl 


Hi 


+ 


where A k is the value of F* for all x k zero, A k , is the cofactor of 4, in A k , etc. 

For a determinant of order n, the value of the determinant obtained after a 
single reduction is the value of the original determinant multiplied by the 
n — 2 power of the pivotal element used. Applying this to F k , it follows that 

A k = (atF 1 + = Hk~ k ~ 1 F k ~ 1 

At = HV^FtT 1 

A h — F^T 1 

xx \ , ivo. “A ■*) 


etc., where the exponents of Hk are ordlhary exponents rather than notation. 
Substituting in (1), 


F k = m~ k ' l F k - 


+ #r*“ s t xtFtj 1 + A nr h ~ 

Hi 2! 


Hi 


In order to express F k in terms of the original determinant, this expansion 
will be condensed by means of the following operational notation. 

(2) F k = (1 + D + If + • • • + D n ~ k )IJr k ~ 1 F k -\ 

where D' operates on Hk~ k ~ 1 F h ~ 1 by reducing the exponent of Hl~ k ~ l by i units, 
by summing from k + 1 to n the product of i terms in x k with the corresponding 
cofactors of F k ~ l , and dividing the result by factorial i. Using this as a recursion 
formula, 


F k - (1 + D + • • ■ + + • • • + D n ~ k+1 )Hl:? ■ ■ ■ 

(1 + • + D n ~tHrV. 

However, 

I an + *ii 


F° = 


a „n 4* x n 


= A, 


since we assume that = 0 for our original determinant. Consequently, 
F k = (1 + ... + U fl - t )frr*' 1 (1 + • • • + D n ~ w )HkIt • • • 


(3) 


Since D' operates on F* _1 in (2) to extract the proper cofactor of i less rows than 
in F k ~ 1 ) which in turn reduoes the exponent of all factors Hk -1 in the expansion 
of F 1-1 by i units, D' reduces the exponent of all H’s following it in the expansion 
of F k in (3) by i units. 
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Following these rules of operation, and expanding so an to collect terms of the 
same degree in the x’s, we may write 

F h = A + /FT 1 "' 2 • ■ (terms in a:,,) + . 


(4) 


Hk~ k * • ■ • Hi * (terms in Xi,x pt ) + • •. 


Letting H = HJH.-, ■ ■ • H, and C = HI * _1 • ■ • Hr , we may write 

I = F k C\ = C 4 (terms in x<,) + ~ (terms in x ti x p ,) + ■ • • 

L H ft* 


and hence 

(5) J = ~c ~ A = F ^ tems in x '^ W 2 ( terms in Xi > x ^ + ■ • • • 

Now J is the difference between thi^ calculated value of A, using Ohio’s reduc¬ 
tion method and rounding off after each reduction, and the true value of A. 
We are interested in finding an upper bound for the magnitude of J, To ac¬ 
complish this we shall first overestimate the number of terms in the various 
sums of (5), then find an upper bound for the magnitude of the terms in these 
sums, and finally combine the two results. 

In counting terms by means of (3), we may ignore the H ’s since they merely 
serve as coefficients of the x’s. Therefore consider the nature of the terms in 


, (1 + + D”-*)( 1 + • • + D B_W ) • • • (1 + . • ■ + D n ~ l ) A. 

f n 1 W 

Now (1 + ■ • + D')A contains the sums £ ^ J2J2xi,x pll A,,,,,, etc.; 

, n->+l *1 

hence it contains s 2 terms in x,j , 8 --- ^ terms in xi,x pq , etc. Each of these 

A 


is not greater than s 2 , ,iC 1 , etc ; consequently, the number of terms of each type 
is not greater than the coefficient of the corresponding power of D in the expan¬ 
sion of (1 + D)' 1 . Therefore, 


(6) (1 + D) U ~* )J ( 1 + D) ln ~ k+1)i • • • (1 + D) (n_1)1 = (1 + D) M , 

where m = (n — kf -\- ■ • ■ +■ (n — l) 2 , contains at least as many terms of each 
type as are found in the expansion of F k . This gives us the desired overestimate 
of the number of terms in the various sums of (5). 

In finding upper bounds for the magnitudes of terms, it is to be noted that (4) 
is written with all common factors extracted from each set of terms of the same 
degree in the x’s. In the parenthesis containing terms consisting of the product 
of r xs, the first sum will have unity for its coefficient while the last sum will have 
HkHk -1 ■ • Hi as coefficient, with all sums between having as coefficients prod¬ 
ucts of H s with exponents < r. Hence an upper bound for all coefficients in 
this parenthesis may be written as H r , where H is the magnitude of the product 
of those H s whose magnitude is greater than unity, but unity if none exceeds 
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unity. Now pterins in x t , are multiplied by A„ , those in x l] x pt by A im , etc.; 
therefore let A,,, A, j; , 9 , etc., be the absolute values of the largest in magnitude 
of such cofactors. With this notation for upper bounds for magnitudes of 
terms, and (6) giving an upper bound for the number of terms, we may write an 
upper bound for the magnitude of J as follows: 

(7) I J I < (|«) nAA* + (| +..., 


where e > | x | is the maximum error of rounding. This result is valid for any 
determinant with real elements. All quantities on the right are available from 
calculations except the A; consequently this upper bound will be useful only if 
satisfactory bounds exist for the minors of the determinant. It can be shown 
that (7) holds for any minor of A, say A«„, if the A have uv added as subscripts; 
and therefore it may be applied to the question of the accuracy of least square 
solutions. 

For the correlation determinant A it can be shown that the magnitude of a 
minor of order n — k is bounded by M/2 ik for k even and fci/2 i(t_1) for k odd. 


Setting a = and substituting these bounds in (7), 


| J | < am -f a m Gi — + a 3 m Ci + a m Ct — + • ■ • 


, aW . a i m i , o*m 4 , 
<am+— + — + — + 


( 8 ) 


2 2 8 3 

. .am, am 

< am -p —— =—r--r, 

2 2(1 — am) 


for am < 1. Since am is obtainable from the calculations for A, this is the 
desired upper bound for the error in question. 


3. Probability Bounds. In order to find probability bounds for this error, 
it will be necessary to expand the H’ s since they involve the variables x. Con¬ 
sider Hk = a** 1 + Xkk 1 . Since a** 1 came from repeated reductions of A, it is 
expressible in terms of the x’s and the minors of A To obtain this expansion of 
Hk consider 

I afll+u-j+i + xfcl+i k-j+i I 


<?* = 


awb *r xu 


Using the same methods as for F k , this may be written as 

G‘ = B' + £ x^’Bh + I 2E + • • •, 

h—i+l 21 fc-j+l 
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where B‘ is the value of G‘ for all x k ' zero, etc., and where B ’ = 

B\, - HkZlGi* 1 , etc. Substituting, 

g’ = mz\G ,+i + mzi e %;?g\v +1 mz] ee + • • ■ 

Using operational notation here also, this may be written as 

<r = (1 + E + E 2 + ■ • - + E^HiPlG^ 1 , 

where the E’s operate the same as the D’s, except that sums are taken from 
k - s + 1 to k rather than from n — s + 1 to n. Treating this as a recursion 
formula, 


H k = G 1 = (1 + E)HU(1 + E + E*)HU 
However, 


(?* = 


dll + *11 


Oil 

• 


■ 

OLkk + Xkk 


dkk 


(1 + ■ • + E’ , ~ 1 )Hr 2 G\ 


= A k . 


+ E k ~ x )H ^ 2 Ak. 


Consequently, 

(9) Hj, = (1 + £0//Li(1 + E + E 2 )Hl~! ■■■ (1 + 

Since the E'a operate on the following ff’s to reduce their exponents, the number 
of terns of various types, that is, of various degrees in the x’e, will not be de¬ 
creased if the order of H’s is disregarded and their exponents held fixed. There¬ 
fore consider 

(10) H' k = (1 + £0(1 + E + E 2 ) ... (1 + ... -f E h ^)A k Ill_ x . .. h’T 2 

as an ordihary recursion formula in the H’a for overestimating the number of 
terms of various types. If (10) is substituted for successive H’s within itself 
in a systematic manner until no H’a remain, it will be found that 

(11) # - (1 +*)••■ (1 + • • + E k ~’)A k 

K1+ *) ••(!+•• + . • [(1 + E)Aif k ~ i [Ax] 84 ' 1 . 

To merely count terms it is permissible to combine like terms to give 

H * - (1 + E r n2H +2 ‘-\l +E + , .(!+.,.+ E ^)K 

= (1 + Ef-\ 1 + E + E'f-' ... (i + ... + 

!£"* K ! S *!!r daot J flr the A ’ S - Since the W* l^e the D% the same 
arguments as those used to arrive at (6) may be used to replace (1 +B+ . 

r t J or weref mating the number of terms. Hence, the number 

f terms of various types in H k is not greater than those in 

(1 + ^‘“(l + £) 2, - ! ‘-‘ ... (i + £)<*-»* *» (l + £)(*-«* = (1 + E yk t 
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where Wk — 2 fc 3 + 2 2 2 k 4 -|- ■ • • -j- [k — 2) 2 • 2° + (k — l) 2 . Therefore the 
number of terms of various types in. Hk~ k ~ l - ■ ■ HI"' 1 is not greater than in 

(12) (1 -j_ "+(n-2)i»i jj,y 

It is easily shown that t can be condensed into the form 

(13) i = [2 k ~\n ~k)~ 1] + 2 2 [2 k ~\n - jfc)-l] + ... +(Jfc - l) 2 [2°(n - *)-l]. 

From (3) it is evident that the number of terms of various types in F k will not 
be greater than those in the expansion of F k when the exponents of the H’s 
are held fixed But from (6) we have an upper bound for the number of terms 
arising from the D’s, and from (12) those arising from the H’s ; hence the number 
, of terms in question will certainly be bounded by those in 

(14) (1 + D) m+ ‘ = (1 + Df. 

Now consider the magnitude of terms. The terms arising from the operation 
of D’s contain minors of A as factors, while those arising from the operation 
of E’s contain minors of A,, where % ranges from 1 to k. Let A;,-, etc., denote 
an upper bound for the magnitudes of all such minors of the same number of 
subscripts. It is easily shown that A' with 2r subscripts is not less than the 
magnitude of the product of several minors whose subscripts total 2r in number 
The terms of various types also contain as factors products of the constant 
terms in the H’s. The constant term in Hk, which will be denoted by h k , 
can be obtained from (11) by operating with all ones since it will'be unaffected 
by' disregarding the order of operation. Hence, 

hi, = A*A*_jAfc_j • • • A 2 A 2 

Since the A, are principal minors of a positive definite determinant with no 
element greater than unity, h k has unity for an upper bound. Thus, an upper 
bound for the magnitude of any term in the product of i x’s will be t times A' 
with 2 % subscripts. 

With upper bounds now available for the number of terms and the magni¬ 
tudes of terms, we are in a position to consider the complete expansion of I m 
which the coefficients of the z's will be constants rather than H’s. Evidently 
the terms in *<,• will come from the terms in x<, of (4) with the H’s replaced by 
the constant terms in their expansions. If Z denotes these terms, then 


Z = hi 


b-*-j 


(15) 


hi 3 Au + hk 23 zff 1 


4- ■ • • 4- hk 


• ■ • M 23 £i/A(/j. 


Now consider an upper bound for 1 1 — Z |. Since I — Z involves only terms 
in the product of two or more x’s, we need consider an upper bound for such 
terms only. From tbe results of the, two preceding paragraphs, we obtain 

1 1 - Z | < A.',„ 4 t\C» A.' JP|U v + • • ■ . 
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But from the paragraph containing (8), bounds are available for the A'; hence 
\I~Z\<>\C i + t \C 3 ^ + t \C i j i + ... 


2 2 3 3 

< 1 £ J- € p 

~ 2 T 2(1 - 41) 




for e/x < 1. Since Z is of order t, $ will ordinarily bo small compared with Z; 

therefore consider the nature of the distribution of Z. 

If we write Z = + • • + a P x v , then, since the x’s arc independently 

distributed with rectangular distributions, it is easily shown that ^ - 
a 

2 E «3 = o, = 3 - ! £ aJ/(E a?) 2 . If the a, are approximately equal 

in magnitude, then a A is approximately equal to 3 — 1/p. But from (15) 
V > ~ + ■ ■ + \{n - l) 2 , which is sufficiently large for determinants 

employing Ohio’s method to justify the assumption that Z is approximately 
normally distributed. Setting L = /i" - * -2 ... h?~ s , 




l(n — A :) 2 + ... + (« - l ) 2 ~ $((n ft) + ... -f (n - l) 2 j] 


< ~ j^(?i - ft) 2 + ... + ( n - l) 2 - ~ (2n - ft - 1) J = ¥ 2 . 

Hence, the probability is >.95 that [ Z | < 2¥. Since \ I - Z\ < <I>, the 
probability is >.95 that 1 1 | < 24 r + §; and therefore the probability is >.95 
that 


( 16 ) |j|<fLt®. 

C 

This inequality will usually give a smaller bound for | J | than (8). How¬ 
ever when A is small the H’s may be small, with the result that C will be small 
and (16) may not give a satisfactory bound for ( J |, In such cases the bound 
given by (8) may not prove satisfactory either. 


4 Example. Consider a correlation determinant of order 7 in which the 
elements are accurate to 4 decimal places. If Ohio’s reduction method is 
applied until a 2 rowed determinant is obtained, then n = 7, ft = 5 e = ,00005 
m - 90, ft = 176, 'P = 00005 V 1,60/3, and we obtain from (8) that 


Ul< 



•0°45 + f -) .00001 + 



.00000005 
1 - .0045 ff/H 
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where H/H is obtained from calculations involved in evaluating the deter¬ 
minant, From (16) we obtain that the probability is >.95 that 


m < 


.0008 
C ' 


The relative advantage of the second inequality over the first depends on the 
size of the pivotal elements, as does the usefulness of either inequality. 

University of California at Los Angeles 



THE CUMULATIVE NUMBERS AND THEIR POLYNOMIALS 

By P. S. Dwyer 

In a recent paper [1] the author has shown how the moments of a distribution 
can be obtained' from the last entries of cumulative columns with the use of 
multiplication by certain numbers. These numbers may be called 1 cumulative 
numbers,” It is the aim of this paper to show how these numbers can be 
obtained from the expansion of x‘ m terms of factorials of the s-th order and to 
demonstrate properties of the polynomials of which these numbers are the co¬ 
efficients. 


TABLE 1 


Successive Frequency Cumulations 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

X 

X 

/» 

C 1 

C 1 

C> 

C* 

C 5 

a 6 

6 

64 

64 

64 

64 

64 

64 

a + 5 

5 

192 

256 

320 

384 

448 

512 

a + 4 

4 

240 

496 

816 

1200 

1648 

2160 

a + 3 

3 

160 

656 

1472 

2672 

4320 

6480 

a -j- 2 

2 

60 

716 

2188 

4860 

9180 

15660 

<2 -p 1 

1 

12 

728 

2916 

7776 

16956 

32616 

a 

0 

1 

729 

3645 

11421 

28377 

60993 


1, The values C[(u x ). We use the notation C\(u x ) of the previous paper 
[1,289] to express the columnar chmulated entries The j indicates the order 
of the cumulation while the i indicates the number of the term, counting from 
the bottom of the column. Thus in Table I, which presents the cumulations 
of a frequency distribution used in the previous paper [1,289], <7} = 729; C\ = 
3645; C 2 = 2916; ■ ■ , Ci = 6480, etc. Now if k -+- 1 values of x are spaced at 
1 unit distances and if 1 the smallest value of x is 0, it can be shown that 



Ct-Efr+.IH; CS-Em.; Ci » E ti 

0 0 0 21 


n3 V ( X + 1)* 

~ o 21 U '' 


'X j 


a\ _ V ” !) 

t. g rr- Wj 

0 2! 


and, in general, j > 0 and j + 1 > i, 



C \ +1 


= £ ^ ^ 7 + * 

x™0 j\ 
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Similarly if k values of x are spaced at unit distances and if the smallest value 
of x is 1, it can be shown that 

cl = Cl = i Cl = E (* - IK; cl = £ 


<*-5 


1 1 
_ v (* — 1)(* — 2) 


21 


x(x ^8 _ V 

2! Ux ' Li Y 2T “* 

and, in general, j > 0 and j + 1 > i, 

(2) Cl +1 = £ ^- + X f i)(3) 

It is to be noted that the coefficients of u* in (2) could be obtained from the 
coefficients of u* in (1) by the substitution x + 1 = x 1 . 


2. The powers in terms of factorials of the s-th order. If the s-th powers can 
be expressed in terms of factorials of the s-th order (factorials having a factors) 
then the moments can be expressed in terms of the cumulations. For example 


2 (x + l)x + x(x — 1) , ,,, 

x - - jr — - so, from (I) 


* » INTO fc „TO 

E*7. = £ + E~h = cl + cl 

o o 21 o 21 


And since 


„« _ (* + 2)< 3) + 4(x + 1) (8) + x (3) „„ 
x =-_________-, we have 


3! 


* * („ I <i\W * O, I 1\«) * 

2 *7. - 2 KK- + 4 £ + 2 V-" c! + + c !' 

o o o! o ol o ol 


In general if 

A,i(x -f" s — 1) (,) + A, 2 (x + s — 2) (,) 

^ x • __ + • • • + A„(x + s — j) w + • • ■ + A„x (,) 

s! ’ 

then 

(4) £ *‘f* = A.icr 1 + a, s c; +1 + • • • + A.fOt 1 + • • • + a,.c:S , 

0 

while if the smallest value of ai is 1, we have 

( 5 ) E x'u = a,,ci +1 + a. s c; +1 + a. yef 1 + ■ • ■ + a„c: +1 . 

i 

These quantities, A,y, in (4) and (5) are simply the coefficients of certain fac¬ 
torials of the s-th order in the expansion of x's!. 
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These numbers, for small values of s, are easily obtained. It is possible to 
use the table and a recursion formula of a previous paper [1,294-295] for larger 
values of s It is also possible to obtain these values, without involving cumula¬ 
tive theory, from (3) above. 

While doing this we make a more general approach by expanding (a + x)‘ 
in terms of these same factorials with the coefficients now functions of a. This 
is possible if we add an additional term, 4 a0 (2; -f- s)'* 1 , to the numerator of the 
right hand side of (3). We have then 


(6) 


(a + x)‘ = 


.djoCr + *) w -f- Aai(x + s — 1) <8> 

_ + • • • + A S] {x d~ £ — j) M d~ ■ • • + A„ a x M 

si 


The determination of the values 4„, can be accomplished by purely algebraic 
means by successive substitution of x = 0, 1, 2, . s In this way wo obtain 
s + 1 equations in s + 1 unknowns For example when s = 2 

l L ^2 Aio(x + 2) <2> .42l(:r + 1) <2) + AnX^ 

{a -f x) -S7~ 


so that when x = 0,1, 2, we have 

a = A 20 ; (a, -f- 1) = 3420 + 42i; (a -|- 2) 2 = 6420 H- 3421 + 422 . 

The solution is 4 20 = .a 2 ,4 2 i = 2ab + 1, 4 22 = b 2 where b = 1 - a. It 
follows that 

(a + xY = a 2 ^t- 2)<2 - + (2ab + 1) + b 2 ~ and hence that 

k 

Z (a + x)% = a 2 cl + (2ab + 1 )Cl + b 2 Cl 

0 ' 

as indicated in the previous paper [1,293]. 

When a = 0, then b = 1 and we have 


22 ;/j, — C 2 + Cl while when a = 1, b = 0 and the right 
hand side becomes Cl + C\ 

It follows that the general cumulative numbers might also be defined as the 
solutions of the s + 1 equations in the s + 1 unknowns obtained by placing 
* = 0, 1, 2, •• , s in (6), 


3. The evaluation of the cumulative numbers. Formal algebraic methods of 
evaluating equations (6) are somewhat tedious so we use finite difference theory 
to aid in finding the solution. As in the previous paper [1] we use the notation 


V v x = v z — » x _i and y x = 


fv* when a < x < a + k' 
[0 otherwise / 


We then write, from (6) 
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s!(a + .cy — ;l 6 u(.r + s) (,) 4“ 4~ a — i) (,) 

(7) - 

+ + A.,(x + s — j) M + .. + A ls x M 

We note further that V +1 ( a, -|~ r ) (,) = \ ^ q|- Wo have then 

(8) V‘ +1 ( a + j)‘ = A S] . 

It lias been .shown in the previous paper [1,292] that 

(9) v ,+) (g +jT = f '(-I)' ( s j x ) (a + j - ty 

and it appears that the cumulative numbers could be defined by ( 9 ) A useful 
recursion formula has been derived from ( 9 ) 

( 10 ) = (« + 3)V > + sT 1 + (s + 1 - a - x)V {cl + a - 1)*~\ 

4. The cumulative polynomials. We define the cumulative polynomials to 

be the polynomials obtained by using the cumulative numbers as coefficients 
Thus when a = 0, 

Pi = r, Pt*=y + ?/; P 3 = y 4- 4y 2 4- v 3 ; Pi = y + 11 y* 4* 11 y 3 4- ?/; etc. 

It is possible to derive a recursion formula for these polynomials. We use 

( 10 ) with s replaced by s + 1 and a = 0 and get 

(11) P 8 -h - SV‘ +2 (aTV = 2*V* + 1 (a)y + S(s 4- 2 - t)V" + 1 ( £ 
which becomes, after some manipulation, 

(12) P »+1 = (1 - 2/)2xV* +1 ^V + (® 4- 1)2/P. ■ 

To illustrate we get P 4 from P 3 = y 4- 4i/ 2 4 - j/ 3 . Now SsV^apy = y + 
8lf + 3y 3 and P 4 = (1 - y)(y + 8 y 2 4- 3 y % ) 4- 4 y{y + 4 if + f) ~j 4 - lly* 4 . 
4 f + 1 /- The recursion formula (12) can be expressed also in the form of a 

differential equation, since P[ = — (P.) = 2aV' + 1 (s)y _1 , as 

( 13 ) P .+1 = »[(1 - y)P', + (s 4- 1 )PJ. 

It dan be shown more generally that for any a 

•Pa.o = 1; P a ,! = a + by) P a , 2 = a 2 4- (2 ab 4- 1 )y 4- 1>V, etc. with 

(14) P fl[1+ ! - 2/(1 - 2 /)Pa,, 4- [a(l -») + (■ + l) 2 /]Pi„ 
as the recursion formula. 


5. The numerator coefficients in successive derivatives of the logistic function. 
Lotka has recently exhibited the coefficients of the numerator terms of sue- 
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cessive derivatives of the logistic function [2, 160], These appear to be, aside 
from sign, the same as the cumulative numbers when a = 0. It is shown in 
this section that these numbers are the cumulative numbers. The scheme is 
generalized to include the numerator coefficients of the derivatives of a more 
general function involving the parameter a. 


Lotka used the function 4>o = 


2 vtf-i .rt-v 

re (1 — e ) 

■ (1 + e ri)ir 


1 + e ri 


and obtained 4>i = 


re 


(1 + e f ‘) 2 


, = 


, etc. The numerical coefficients are the same if r = I so we might 


as well use r I> 0 


I 

1 + 6 *' 


A more general function is the two parameter function 


(15) 




1 + ce 1 


Let successive derivatives with respect to x be indicated by $ a , 0 ,i ; s ; , • 

etc. Then 


A _ e°*[o + c(l — a)e*] 

W (1 + ce*) 2 ’ 

= e°V + (~2a z + 2a + Dee 1 + (1 - fl )V e fa ] 
(1 + ce*)> 


In general, 


so that 




e° x Q a , Cl . 

, (1 + ce*) ,+1 


(1 + ce*) - * -1 


$ . — e ((^ ~^~' Ce + Qo ,c,■] (s + l)ce*Q a . c ,,) 

0P ‘* (1 + ce*) ,+ * 

and 

(16) Qa,',. +l = (1 + cc‘)[oQ«,„,. + Q', e ,.] - (s + l)ce*Q Ol0lI . 

The Q functions can be changed to polynomials with the substitution e = y. 
Then derivatives are taken with respect to y and 

(17) P a , Ct<+1 = (1 + cy)[aP a , Ci , + j/P', 0il ] - ( 5 + 1 )cyP a , e „ . 

\Jffien c = -1, this becomes formula (14) and since P M = 1, it follows that 
the numbers of the present section are generalized cumulative numbers, When 
c = 1 and a = 0 we have the numbers found by Lotka. 

It can be shown, further, that the c coefficient of y 1 is c'. It follows that the 
absolute values of the coefficients, when c = 1 and when c = -1, are the same. 


6. Formulas for lx'. A formula for the sums of the s-th 
integers from 1 to k is obtained by summing (3). We get 


powers of the 




E c; +I d)v‘ +1 (jy. 

For example 

^ , (fc + 2) <a> + (k + 1) (8> _ k(k + 1)(2 k + 1) 
r X 31 6 ' 

^ 8 _ (k + 3) <4) + 4(fc + 2) (4) + (k + 1) <4) k\k + 1 ) s 
i X 41 4 

’ a+fc 

More generally the values of E ** can be evaluated by 

a 

a+fc 1 * 6 

(21) Ex’= t— rrn E (fc + « - }) ( ' +1 V‘ +1 (a + j)* = E C#(l)V* +1 (a +7)*. 

a (S + 1) ! j-0 - 3-0 - 

7. Summary. It is shown how the cumulative numbers and the cumulative 
polynomials may be obtained in a variety of ways. Of special interest is the 
fact that the cumulative numbers can be obtained by expanding powers in 
terms of factorials and hence they might be called factorial coefficients of a 
kind. It is also possible, though it is not within the scope of this paper, to 
establish interesting relations between the cumulative numbers and the multi¬ 
nomial coefficients, the usual factorial coefficients, the difference of 0, etc. 
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ENUMERATION AND CONSTRUCTION OF BALANCED INCOMPLETE 
BLOCK CONFIGURATIONS 1 

By Gertrude M. Cox 

1. Introduction. One of the general problems of experimental design is to 
avoid extraneous effects in making desired comparisons. The method employed 
is to use experimental materials as nearly homogeneous as possible. Such 
materials,, however, are seldom available in large quantities. On the contrary, 
field soils vary in fertility from block to block, animals vary with both litter and 
sex, and leaves on one young plant differ from those on another. Differences 
between blocks, between litters and sex, and between plants, being irrelevant 
to the comparisons usually contemplated, must be avoided. 

When the number of treatments to be compared is small, well known methods 
of design, such as the Latin square or randomized complete block, are available 
and efficient. As the number of treatments increases, however, these designs 
tend to become less efficient through failure to eliminate heterogeneity. Fur¬ 
thermore, they become cumbersome, the Latin square design requiring replicates 
equal m number to the treatments and the complete block design providing that 
each treatment occur in every block (Blocks arc defined as an assemblage of 
experimental units chosen to be as nearly alike as possible.) 

Because of such limitations, several modifications of the complete block design 
hove been din isod Those new designs all have the common, characteristic that 
thqexperimental material is divided into groups or blocks containing fewer units 
than th'e number of treatments to be compared, These more homogeneous 
small blocks are referred to as incomplete blocks. 

It is desirable to have all comparisons between pairs of treatments made with 
equal accuracy. This requires of the design that every pair of treatments 
occur in the same block an equal number of times. Such a design is referred to 
as balanced Balanced incomplete block designs can be arranged (for any given 
number of treatments) only for certain combinations of block size and number of 
replications. 2 

The construction of balanced incomplete block designs is mathematically a 
part of the theory of configurations. A configuration is an assemblage of 
elements into sets, each element occurring in the same number of sets, and each 

1 A revision of an expository paper presented under a different title at a joint meeting 
of the Institute of Mathematical Statistics and Biometnc Section of the American Statisti¬ 
cal Association, December 27,1939 

2 Numerous additional designs arc available in the partially balanced incomplete blocks 
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set containing the same number of elements The configurations to be con¬ 
sidered here me the complete configurations, i e., those in which each element 
occurs an equal number of times in the same set with ('very other element. It 
would be useful to know, (a) what configurations (within the, useful range) 
exist, (b) how these configurations may be constructed. 

The typical requirement of the experimenter is this; "I wish to test t treat¬ 
ments and can use blocks of size k{t > k). I should like a design which will 
involve as little experimental material as feasible ” The designer must then 
determine what configuration of i elements in sets of k will satisfy the incidence 
relation that each pair of elements occur together in a set an equal number of 
times, and for which the total number of sets is a minimum. There are still 
many configurations which the experimenter needs but which have not as yet 
been constructed. 

In order better to explain the construction of these balanced incomplete block 
designs, it is essential to specify the underlying combinatorial problems. A 
configuration satisfying the condition of balance can be obtained by writing 
down all possible combinations, b, of the t elements taken k at a time, 

b = lCk = kl(t-k)!' 

The simplest example is that in which each set contains only two elements and 
all possible combinations of the t elements, taken in pairs, appear in the different 
sets. This series of pairs can be written out by the experimenter, and the 
method of analysis is given by Yates [20], 

Let us take another example; given six elements to be taken three at a time, 

‘-■a-mr* 

The 20 combinations are, 


123 

134 

146 

236 

345 

m 

135 

156 

245 

346 

125 

136 

234 

246 

356 

126 

145 

235 

256 

456. 


Such unreduced designs are not necessarily economical or feasible in experimental 
work. It is often desirable to find some less extensive configuration In this 
example half of the combinations, cither those in italics or the other half, fulfill 
the restriction that every element occur with every other element in the same 
number of sets. Each pair of elements occurs twice in either group of sets, 
Thus, a balanced incomplete block design can be based on either half of the 
20 sets as well as on all 20. 

2. Combinatorial methods. Combinatorial considerations of a simple nature 
enable us to set up necessary conditions which balanced designs must satisfy. 
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We have t elements arranged in b sets of k elements each; each element occurs in r 
sets, and each pair of elements occurs together m a set exactly X times. Then 
we must have 

fr = bk, r(k - 1) = X(f - 1). 

The first of these equations expresses the fact that the total number of plots 
must be equal both to the product of elements by replications and to the product 
of sets by number of elements per set; the second, that the number of pairs into 
which a given element enters must equal X times the remaining number of 
elements. 

It is convenient to write 


X« - 1) . _ U(t - 1) 

k - 1 ’ k(k- 1) ' 


Since the numbers t, b, r, k, X must be integers, it is easy to obtain lower limits 
for any three in terms of the other two. 

To give a general classification, the configurations have been divided into 
classes according to the value of X. Because of the practical limitations in 
experimentation, table I has .been expanded only to include X = 6 and the k 
values from 1-14. It may be well to call attention to the fact that duplications 
occur in the different classes of table I, For instance in the class, X = 1, for 
fc = 6, < = 15m + 1, and m = 1, then 6 = 8, and r = 3. In order to construct a 
design, the following condition is necessary; r > k and therefore b > t. In this 
example, the condition is met; if b, r and X are multiplied by 2, the resulting design 
is t = 16, b = 16, r = 6, k = ‘6 and X = 2. This configuration is a duplicate 
of the design in the class, X = 2, for k = 6 and m = 1. In many of the con¬ 
figurations where X is 3, 4,5, or 6, a common factor can be cancelled from b, r and 
X giving a design listed in the classes , X = 1, 2 or 3. 

It should be emphasized that the conditions under which table I was derived 
are necessary, but not sufficient, for the existence of a complete configuration. 
For example, consider the following configurations which satisfy the necessary 
conditions for a design. 


Sub class 
(table I) 

m 

t 

10m + 5 

1 

15 

21m + 1 

1 

22 

15m + 6 

2 

36 

42m + 1 

1 

43 

45m + 10 

2 

100 

110m + 1 

1 

111 


b 

r 

k 

\ 

21 

7 

5 

2 

22 

7 

7 

2 

42 

7 

6 

1 

43 

7 

7 

1 

110 

11 

10 

1 

111 

11 

11 

1 


No configurations of the above specification can actually be constructed. 

A selected group of configurations from table I is given in table II Only 
those configurations whose k, r and X lie within practical limits, and whose 
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existence has not been disproved, have been included The practical limits of 
k, r and X, of course, arc dependent upon the conditions surrounding the experi¬ 
ment We have chosen to keep k within the range 3 to 10 except for a few special 
configurations m which t is greater than 100, in which cases k was allowed to 
equal 11-14. Also r has been kept within a similar limited range. (Those 
configurations m table II, with an asterisk preceding t, have not been con¬ 
structed.) 

The above limitations upon k and r give a small, selected group of configura¬ 
tions. However, many others either have been constructed or arc known to 
exist. For balanced incomplete block designs, Yates [20] gives the lower limits 
of r for t from 4 to 25 and k from 2 to 12 but not greater than %t. Fisher and 
Yates [8] have tabulated the configurations which are known to exist having 
ten or less replications including all arithmetically possible configurations the 
existence of which has not been disproved 

Even if the existence of a configuration has not been disproved, there still 
remains the difficult problem of writing out the elements which arc to appear in 
each set Some discussion of the structure of such configurations is presented 
by Fisher and Yates [8] by Yates [20, 21] by Goulden [9, 10] and by Bose [4]. 
Additional descriptions are to follow. 

While a search of the literature revealed a number of constructed configura¬ 
tions, yet the general theory of their formation has received relatively little 
consideration. The question of combinations related to the theory of configura¬ 
tions which is of interest here was first set forth by Kirkman [11] in 1847 Ho 
states the problem thus: "If Q x denote the greatest number of triads that can be 
formed with x symbols, so that no duad shall be twice employed, then 

3Q X = x(x - l)/2 - 7* 

if for Vx we put 0, when x = 6m + 1 or 6m + 3.” This gives the formula for b 
which was given earlier in this article Put x = t and V x = 0 

_ Kt - 1) _ t(t - 1) 

3 2 k(k - 1) ‘ 

Besides the theory connected with these combinatorial problems, considerable 
information related to the construction of the configurations has been found in 
the literature on finite projective geometry, especially the geometry which applies 
to the theory of groups. 

An extensive discussion of the X = 1 class of configurations (as listed in table I) 
can be found in the literature. The theory of the formation of the configurations 
for the sub-class f = 6m + 3 has been summarized by Ball [1], This is the 
Kirkman "school-girl problem” for which Eckenstein [7] lists 48 papers and 5 
books written during the years 1847-1911 dealing with this subject. The 
problem was first published in the Lady’s and Gentleman’s Diary for 1850 [12]. 
It is usually stated that "a schoolmistress was in the habit of taking her girls 
for a daily walk. The girls were fifteen in number, and were arranged in five 
rows of three each, so that each girl might haVe two companions. The problem 
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is to dispose of them so that for seven ecmseeutive days no gill will walk with any 
of her school-fellows in any triplet more than once.” For this particular sub¬ 
class (£ = 6m + 3, fc = 3), this type of configuration has been shown to exist 

TABLE II 

Selected Group of Configurations 


(Balanced Incomplete Block Designs) 


( 

b 

?* 

k 

X 

t 

b 

r 

k 

X 

7 

7 

3 

3 

1 Y.S 1 

*25 

50 

8 

4 

1 

7 

7 

4 

4 

2 

25 

30 

6 

5 

1 

8 

14 

7 

4 

3 

25 

15 + 15 

3 

5 

1 LS. 

9 

12 

4 

3 

1 

*25 

25 

9 

0 

3 

9 

6 + 6 

2 

3 

1 L.S, 2 

28 

03 

9 

4 

1 

9 

18 

8 

4 

3 

28 

30 

9 

7 

2 

9 

18 

10 

5 

5 

*29 

29 

8 

8 

2 

9 

12 

8 

0 

5 

31 

31 

6 

0 

1 Y.S. 

10 

30 

9 

3 

2 

*31 

31 

10 

10 

3 

10 

15 

0 

4 

2 

*30 

45 

10 

8 

2 

10 

18 

9 

5 

4 

37 

37 

9 

9 

2 

10 

15 

0 

0 

5 

*41 

82 

10 

5 

1 

11 

11 

5 

5 

2 

*40 

60 

9 

6 

1 

11 

11 

6 

0 

3 

*46 

40 

10 

10 

2 

13 

26 

6 

3 

1 

49 

50 

8 

7 

1 

13 

13 

4 

4 

1 Y S. 

49 

28 + 28 

4 

7 

1 L.S. 

13 

13 

9 

9 

0 

*51 

85 

10 

0 

1 

15 

35 

7 

3 

1 

57 

57 

8 

8 

1 Y.S. 

15 

16 

7 

7 

3 

61 

72 

0 

8 

1 

15 

15 

8 

8 

4 

04 

72 + 72 

9 

8 

2 L.S 

16 

20 

5 

4 

1 * 

73 

73 

9 

9 

1 Y.S. 

16 

20 + 20 

5 

4 

2 L.S 

81 

90 

10 

0 

1 

16 

16 

0 

0 

2 

81 

45 + 45 

5 

9 

1 L.S 

16 

10 

10 

10 

G 

91 

91 

10 

10 

1 Y.S 

10 

57 

9 

3 

1 

121 

132 

12 

11 

1 

19 

19 

9 

9 

4 

121 

60 + 06 

0 

11 

1 L.S 

10 

19 

10 

10 

5 

133 

133 

12 

12 

1 Y.S. 

21 

70 

10 

3 

1 

169 

182 

14 

13 

1 

21 

21 

6 

5 

1 Y.S. 

109 

91 + 91 

7 

13 

1 L.S 

*21 

28 

8 

6 

2 

183 

183 

14 

14 

1 Y.S. 

*21 

30 

10 

7 

3 







•Have not been constructed, 
louden squares, 

* Lattice squares. 


foi every possible value of l. Most of the solutions were, worked by H. E. 
Dudency and 0. JOckcnslein. They are. given by Ball [1] for all £’s less than 100, 
that is, for t = 9, 15, 21, 27, 33, 39, 45, 51, 57, 63, 69, 75, 81, 87, 93 and 99. 
Ball describes several methods of constructing such configurations, as cycles, 
combinations of cycles, scalene triangles inscribed in the circle, focal and analyti- 
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cal methods. As an illlustration of the school-girl problem, the construction 
of the configuration for t = 9, b = 12, r = 4, k — 3 and X = 1 will be shown. 
Scalene triangles are inscribe! 1 in a circle with certain specifications (to be 
fulfilled) giving the three sets of triplets for the first day as follows, 

Set Group I 

(1) k 15 

(2) 3 4 6 

(3) 7 8 2. 

By rotation or by cyclic substitution the other three groups are secured: 


Set 

Group II 


Group III 


Group IV 

(4) 

k 2 6 

(7) 

CO 

(10) 

k 4 

8 

(5) 

4 5 7 

(8)' 

5 6 8 

(11) 

6 7 

1 

(6) 

8 1 3, 

(9) 

1 2 4, 

(12) 

2 3 

5. 


Then placing k = 9, we have the configuration for t - 9, b = 12, and r = 4. 
Note that in the school-girl problem the sets are grouped into complete replica¬ 
tions of the elements. This problem Of 9 girls taken 3 at a time has been sub¬ 
jected to an exhaustive examination. There are 840 arrangements but only one 
fundamental solution In the case of 15 girls, the number of fundamental 
solutions according to Mulden [14] and Cole [6], is seven. Ball mentions the 
Kirkman problem in quartets which is the sub-class t = 12m + 4, for k = 4. 
He states that this has been solved for cases where m does not exceed 49. He 
also states, "I conjecture that similar methods are applicable to corresponding 
problems about quintets, sextets, etc.” 

Before leaving the school-girl problem, an illustration will be given of t — 28, 
b = 63, r = 9, k = 4 and X = 1. The following framework was set up by Dr. 


C. P. Winsor using suggestions from Netto [15]. 


k 

a 

b 

c 

Ol 

as 

b, 

b t 

' at 

07 

bi 

bs 

at 

00 

Ci 

Cl 

o< 

05 

Cl 

Cg 

62 

67 

c 8 

C» 

b< 

&B 

Cj 

C7, 


a, b and c each have every internal difference once and only once; and each pair 
arb, arc and 6-c must have every external difference once and only once. The 

nine groups are given in table III. The cyolic substitution is within three sets, 
a, band c. That is, 
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in group I, 

a = 1, 

fli = 2, 

an = 3, • • 

■ , at = 9 

in group II, 

a = 2, 

<ii = 3, 

a 2 = 4, • • 

■ , a 8 = 1 

in group III, 

a = 3, 

<h = 4, 

ct 2 = 5, • * 

• , a 8 = 2 


etc. 


Netto [15] discusses t elements in sets of h, every Bet of 2 elements to occur 
together in a set exactly X times. He deals with X = 1, and gives a discussion 
of both sub-classes when h = 3, that is, for t = 6m + 1 and t = 6m + 3. Reiss 
[16] and Moore [13] have proved that configurations can be constructed for all 
values of t if k = 3. This is the type of information which is valuable in answer- 


TABLE III 

Configuration for t = 28, b = 63, r = 9, k <= 4, X = 1 


Group I Group II Group III Group IV 


h 

a 

b 

c 

28 

i 

10 

19 

28 

2 

11 

20 

28 

3 

12 

21 

28 

4 

13 

22 

a. 


% 

bt 

2 

9 

13 

10 

3 

1 

14 

17 

4 

2 

15 

18 

5 

3 

16 

10 

&2 


bi 

h 

3 

8 

11 

18 

4 

9 

12 

10 

5 

1 


11 

0 

2 

14 

12 

a, 

at 

Cl 

Ct 
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7 

23 

24 

5 

8 

24 

25 

6 

9 


20 

7 

1 

26 

27 

ai 

at 

Cl 

Ct 

5 

6 

20 

27 

0 

7 

21 

19 

7 

8 

22 

20 

8 

9 

23 

21 

h 

bi 

Cl 

Ct 

12 

17 

22 

El 1 

13 

18 

23 

20 

14 


24 

27 

15 

11 

26 

19 

bi 

bt 

Cl 

Cl 

14 

16 

21 

20 

w 

m 

22 

27 

10 

17 

23 

19 

17 

18 

24 


Group V 


Group VI 


Group VII 

Group VIII 


Group IX 

28 

5 

14 

23 

28 

6 

15 

24 

28 

7 

10 

25 

28 

8 

17 

20 

28 

9 

18 

27 

6 

4 

17 

11 

7 

6 

18 

12 

8 

6 

10 

13 

9 

7 

11 

14 

1 

8 

12 

16 

7 

3 

15 

13 

8 

4 

10 

14 

9 

5 

17 

15 

1 

0 

18 

10 

2 

7 

E3 

17 

8 

2 

27 

19 

9 

3 

19 

20 

1 

4 

20 

21 

2 

5 

21 

22 

3 

m 

22 

23 

9 

1 

24 

22 

1 

2 

25 

23 

2 

3 

20 

24 

3 

4 

27 

25 

4 

6 

19 

20 

16 

12 

20 

20 

17 

13 

27 

21 

18 

14 

El 

22 

m 

15 


23 

11 

16 

21 

24 

18 

10 

26 

21 

m 

11 

20 

22 

11 

12 

27 

23 

12 

13 

19 

24 

13 

14 


25 


ing the first question in the introduction of this article; "what configurations 
exist?" Carmichael [5] mentions the quadruple systems 6m ,+ 2 and 6m + 4 
and states that the general problem of their existence appears not to have been 
solved. Also for the higher values of k there seems to be very little known of 
any generality, but it is known that for k > 3 there are certain configurations 
which are not possible. 

3. The method of geometrical configuration. Another aid in the construction 
of balanced incomplete block designs is found in some of the finite projeotive 
geometries. These are described by Carmiobael [6]. A tactical configuration 
of rank two is defined as a combination of l elements into m sets, each set con¬ 
taining X distinct elements, and each element occurring in n distinct sets, 
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l = (f) = number of points in the geometry, 
m = (b) — number of lines, 

X = (fc) = m mber of points, 
n = (r) = number of lines on a point. 


The series of finite projective geometries PG(k, p") for k > 1 furnishes a 
certain infinite class of these tactical configurations. The following list gives 
those which have been incorporated in the list (tabic II) of useful balanced 
incomplete block designs. 

Two dimensional space, PG(2, p") 


p n 

m 

m(i>) 

X(fc) 

n(r) 

2 

7 

7 

3 

3 

3 

13 

13 

4 

4 

2 2 

21 

21 

5 

5 

5 

31 

31 

6 

6 

7 

57 

57 

8 

8 

2 a 

73 

73 

9 

0 

3 2 

91 

91 

10 

10 

11 

133 

133 

12 

12 

13 

183 

183 

14 

14. 


Three dimensional space, PG{ 3, p") 


p n 

l 

m 

X’ 

n 

2 

15 

35 

7 

3. 

From the Euclidean geometry EG{k, p") for k 

> 1 other tactical configurations 

can be constructed. 

These are formed from the PG(k, p n ) by omitting a given 

line from the two dimensional space and a plane from the three dimensional 

space configurations 

Some of the resulting designs are: 



Two dimensional space, EG{2, p") 


, p” 

l 

.. m 

X 


2 

4 

6 

3 

2 

3 

9 

12 

4 

3 

2 2 

16 

20 

5 

4 

5 

25 

30 

6 

5 

' 7 

49 

56 

8 

7 

2’ 

64 

72 

9 

8 

3 2 

81 

90 

10 

9 

11 

121 

132 

12 ' 

11 

. 13 - 

169 

182 

14 

13. 
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Methods arc available for constructing the two dimensional space PO(k, p”) 
and the corresponding EG(k, p n ) configurations where p is a prime number. 
This being true, we can also construct the completely orthogonalized squares 
from the EG(k, p") geometry. The reverse situation in which these configura¬ 
tions arc constructed by using the Completely orthogonalized squares is to be 
illustrated. These squares consist of superimposed Latin squares, fulfilling the 
condition that each number from the second Latin square occurs once and only 
once with each number in the first Latin square. As an exainplc take the two 
Latin squares: 

Latin Square I Latin Square II 

12 3 13 2 

2 3 1 2 13 

3 12, 3 2 1. 


Superimpose square 
3x3 square, 


II upon 

square I to 

get the completely orthogonalized 

11 

23 

32 

22 

31 

13 

33 

12 

21. 


The first number in each cell is a value from square I; the second number in each 
cell is from square II. Note that the numbers in the second place in each cell 
occur once and only once with each of the first numbers, that is 1-1,1-3, and 1-2. 
The completely orthogonalized squares have been proven to exist for all prime 
numbers and for powers of prime numbers, The solution of this problem was 
secured independently by Bose [2] and by Stevens [18]. Those of sides 2, 2 2 , 2 3 , 
2 4 , 2 6 , 2°, 3,3 2 , 3 3 , 3\ 5, 5 2 , 5 s , 7, 7 2 ,11 and 13 have been given. 

The completely orthogonalized 3x3 square may be used to construct 


11 

1 

23 

4 

32 

7 

22 

2 

31 

6 

13 

8 

33 

3 

12 

6 

21 

9 


a balanced incomplete block design. The italic numbers, which follow the 
cell numbers, designate the 9 elements which are to be arranged m fpur groups of 
three sets, Group I is formed by placing the elements from each row into sepa¬ 
rate sets, in group II the elements from the three columns are placed in three 
sets; in group III the first set (7) consists of the elements which follow 1 in the 
first place in the cells, sot (8) consists of the elements which follow 2 in the first 
place in the cells; and group IV is assembled in the same way as group III except 
the numbers in the second place in the cells are used to select the elements for 
each set. Thus we have the configuration: 
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Group I 
Set (rows) 

(1) 1 4 7 

(2) 2 5 8 

(3) 3 6 9 


Group II 
(columns) 

(4) 1 2 3 

(5) 4 5 6 

(6) 7 8 9 


Group III 
(first place) 

(7) 1 6 8 

(8) 2 4 9 

(9) 3 5 7 


Group IV 
(second place) 

(10) 1 5 9 

(11) 2 6 7 

(12) 3 4 8 


In the 12 sets of 3 elements, each of the 9 elements occurs with every other 
element once and only once in a set. 

This is an illustration of one series of configurations which can be constructed 
with tile aid of the completely orthogonalized squares. These are the EG(k, p n ) 
in two dimensional space when k = 2 and p” = 2, 3 , 2 2 , 5, 7 , 2 3 , 3 2 , 11, 13, . . . 
The PG(k, p n ) configurations can be written by adding (k + 1) elements 
to the previous group of configurations. For example, the elements 10, 11,12 
and 13 may be added to the groups, one to each group. That is, 10 is added to 
each set in group I, 11 is added to each set m group II, 12 to group III and 13 to 
group IV. An additional set must be added to include these four new elements. 
A configuration for t = 13, 6 = 13, k = 4, r = 4 and X = 1 results. 

Set 


(1) 

1 4 7 10 

(4) 

1 2 3 

11 

(7) 

1 6 8 

12 

(10) 

1 

5 

9 

13 

(2) 

2 5 8 10 

(5) 

4 5 6 

11 

(8) 

2 4 9 

12 

(11) 

2 

6 

7 

13 

(3) 

3 6 9 10 

(6) 

7 8 9 

11 

(9) 

3 5 7 

12 

(12) 

3 

4 

8 

13 


(13) 10 11 12 13. 


The 13 sets are made up of 4 elements each. These designs are symmetrical 
for sets and elements, that is, every pair of elements occurs together in the same 
number of sets, also, every pair of sets has the same number of elements in 
common. Discussion of the construction of these designs with illustrations arC 
given in references [20, 8, 9] and [19]. 

In the PG(k, p") series of designs, as constructed by means of completely 
orthogonalized squares, the sets cannot be arranged in replication groups. How¬ 
ever, these configurations can be arranged in Youden squares [22] in which all 
the sets are placed side by side and all the elements in a single row form a com¬ 
plete replication. This method of arrangement has been of considerable value 
in experimentation with plants. The Youden squares are the PG(k, p n ) when 
k — 2. Singer [17] gives a partial list of the (reduced) perfect difference sets 
(table IV), only a single set for each p". The number of distinct perfect differ¬ 
ence sets (or.the number of distinct perfect partitions) for a given p n is equal to 

v>(g)/3n. Since each perfect difference set can be paired with its inverse, the 
number is even. 

The construction of one of the Youden squares from its perfect difference set 
will be illustrated. Consider p" = 3 then g = p 2n + p" -(- 1 = 3 2 -[- 3 -[- 1 = 13. 
There are two perfect difference sets with their inverses for q = 13. One perfect ’ 
difference set is 0, 1, 3, 9 which has the perfect partition 1, 2, 6, 4 which will 
add in succession to each number from 1 to and including 13, and also 1, 2, 6,4 
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add to 13. The elements of the perfeet difference set arc put in set (1) except 
that 13 replaces 0. Set (2) is secured by a one-step cyclic substitution, 1 for 
13,2 for 1,4 for 3 and 10 for 9. This process is continued until there are thirteen 
sets, If the substitution is applied to set (13), the elements in set (1) are secured. 






Set 






(1) 

(2) 

(3) 

(4) (5) 

(6) (7) 

(8) 

(9) 

(10) (11) (12) (13) 

Replica- A 

13 

1 

2 

3 

4 

5 6 

7 

8 

9 10 11 12 

tion B 

1 

2 

3 

4 

5 

6 7 

8 

9 

10 11 12 13 

C 

3 

4 

5 

6 

7 

8 9 

10 

11 

12 13 1 2 

D 

9 

10 

11 

12 13 

1 2 

3 

4 

5 6 7 8. 

This is the Youden square for t 

= 13, b 

= 13, r 

= 4, 

k = 

4, and A = 1. The 


elements in each row form a complete replication. 


TABLE IV 


Singer's list of perfect difference sets 


v(<l) 

p n q 3" Perfect difference set 


2 

7 

2 

0 

1 

3 












2 2 

21 

2 

0 

1 

4 

14 

16 










2' 

73 

8 

0 

1 

3 

7 

15 

31 

36 

54 

63 






2 ( 

273 

12 

0 

1 

3 

7 

16 

31 

63 

90 

116 

127 

136 

181 

194 

204 

3 

13 

4 

0 

1 

3 

9 











3 ! 

91 

12 

0 

1 

3 

9 

27 

49 

56 

81 

77 

81 





5 

31 

10 

0 

1 

3 

8 

12 

18 









7 

57 

12 

0 

1 

3 

13 

32 

36 

43 

52 







11 

133 

36 

0 

1 

3 

12 

20 

34 

38 

81 

88 

94 

104 

109 



13 

183 

40 

0 

1 

3 

16 

23 

28 

42 

76 

82 

86 

119 

137 

154 

176 


t = q = V v> + + 1 


A third series of configurations, called Lattice squares or quasi-Latin squares 
[21] can be constructed by using the completely orthogonalized squares. The 
groups of sets on page 78 are taken in pairs. For each pair a square is constructed 
having its rows formed by the sets of one group and its columns by the sets of 
another group. For example, square I below is made so that the sets of group I 
form the rows and the sets of group II form the columns. Square II is the 
combination of groups III and IV. 

Square I _ Square II 


1 

4 

7 

2 

5 

00 

GO 

6 

9 


1 

6 

00 

9 

2 

4 

‘ 5 

7 

3 
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In this lattice square each pair of elements occurs together once only in either a 
row oi; a Column of cither one of the squares Also, every element occurs with 
every other element once in one column and one row from each square. 

A device known as “complements” gives several configurations. From an 
arrangement having k jf, a second one can be obtained for the same number 
of elements, in sets of t - k units This is done by replacing each set by its 
complement, that is, by a set containing all the elements missing from the 
original set. An illustration follows' 


i 

= 7, 

b 

= 7 



i = 7, 

b = 

7 


r 

= 3, 

k 

= 3 



r = 4, 

k = 

4 



X = 

1 




X 

= 2 



Set 





Set 





(1) 

1 


2 

4 

(1) 

3 

5 

6 

7 

(2) 

2 


3 

5 

(2) 

1 

4 

6 

7 

(3) 

3 


4 

6 

(3) 

1 

2 

5 

7 

(4) 

4 


5 

7 

(4) 

1 

2 

3 

6 

(5) 

5 


6 

1 

(5) 

2 

3 

4 

7 

(6) 

6 


7 

2 

(6) 

1 

3 

4 

5 

' (7) 

7 


1 

3, 

(7) 

2 

4 

5 

6. 


■ While the triple systems, quadruple systems, etc., which have been con¬ 
sidered by some mathematicians, do furnish designs meeting the balance re¬ 
quirements, they are usually not suitable for experimental purposes A quad¬ 
ruple system requires that every possible triple of elements occur once and only 
once together in a block. Since we need only every pair together once (X = 1) 
or more, only the triple systems are generally useful. 

4. Summary. The mathematical theory of configuration has been helpful 
in the construction of the balanced incomplete block designs It would be use¬ 
ful to know fa) what configurations (within the useful range) exist, (b) how these 
configurations mnv be constructed. In table I the configurations have been 
classified according to the value of X, while 1 in table II configurations within a 
useful range have been listed. Of the designs in this table which have not been 
constructed, some are known to exist. Those aids which have been used in the 
construction of the balanced incomplete block designs have been briefly dis¬ 
cussed. 
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A COMPARISON OF ALTERNATIVE TESTS OF SIGNIFICANCE FOR 
THE PROBLEM OF m RANKINGS 1 

By Milton Friedman 

A paper published in 1937 [2] suggested that the consilience of a number of 
sets of ranks can be tested by computing a statistic designated x? • A mathe¬ 
matical proof by S. S. Wilks demonstrated that the distribution of xl approaches 
the ordinal y x distribution as the number of sets of ranks increases The 
rapidity with which this limiting distribution is approached was investigated by 
obtaining the exact distributions of Xr for a number of special cases. It was 
concluded that "when the number of sets of ranks is moderately large (say 
greater than 5 for four or more ranks) the significance of Xr can be tested by 
reference to the available x 2 tables” [2, p 695], The use of the normal distribu¬ 
tion was recommended when the number of ranks in each set is large, but the 
number of sets of ranks is small, although no rigorous justification of this pro¬ 
cedure was presented. 

Except for the few special cases for which exact distributions were given, the 
paper did not provide a test of significance for data involving less than six sets of 
ranks and a small or moderate number of ranks in each set. This important 
gap has now been filled by M. G. Kendall and B Babington Smith [1]. In 
addition, thev fumMi a somewhat more exact test of significance for tables of 
tanks foi which the earlier article recommended the use of the x 2 distribution. 

Kendall and Smith use a diffoient statistic, W, defined as Xr divided by its 
maximum \alue, m[n — 1), where n is the number of items ranked, and m the 
number of sets of i auks 2 The new statistic (independently suggested by W. 
Allen Wallis who terms ii the. lank correlation ratio and denotes it by tu) is 
thus not fundamentally different from % 2 • A more radical innovation is the 
improvement in the test of significance that they suggest. Instead of testing 
Xr by reference to the x distribution for n — 1 degrees of freedom, Kendall and 
Smith, generalizing from the first four moments of W, recommend that the 
significance of W be tested by reference to the analysis of variance distribution 

(Fisher’s z-distribution) with z = hog 8 fe - ~ ,»n = (n-l)--,ni = 

^ ■ \ 1 — H 7 / m 

2 

(m — 1) (n — 1) — - , For small values of m and n, they introduce con- 

— r //u_ 

1 The author ia indebted to Mr. W. Allen Wallis for valuable criticism and to Mias Edna 
R, Ehrenberg for computational assistance 

2 This is Kendall and Smith’s notation which will be used in the present paper. The 

original paper [2] designated the number of items ranked by p, and the number of sets of 

ranks by n, 
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123 

tinuity corrections, substituting for W = --^ 3 — , the statistic 

12 


W c = - 


w ___ 

S — 1 _ m 2 {n 3 — n ) 


m 2 (?i 3 — n) 

12 


+ 2 1 + 


24 


m 2 (n 3 — n) 


where S is the observed sum of squares of the deviations of sums of ranks from 
the mean value, m(n + t)/2 Comparison with exact distributions of W (or S) 
for special cases indicates that this test yields very good approximations to the 
correct probabilities, 

In the limit the two tests of significance are identical. Neglecting the 

1 ( (m - l)y 2 \ 1 / y 2 \ 

correction for continuity, z = - log* J ^ lo g« J> n * = 


(m - 1 ) | 

ni = «, 

of \ log a 


(n- 1) - 


m 


00 , andrq = (n — 1) -- 

m 


(n — 1) as m ■ 


For 1 


the analysis of variance distribution is identical with the distribution 
1 

—. The difference between the two tests is thus that one, x, uses 

«i 


a single (limiting) distribution for all values of ?n, whereas the other, z, adapts 
the distribution to the value of m, 

The necessity of taking into account the value of m, while it increases the 
flexibility of the distribution, makes the z test somewhat less convenient in 
practice than the % test. Additional computation is required to obtain the 
values of n x and n 2 , and to make the continuity corrections. It is also fairly 
laborious to test the significance of the result, if exact values of z at any level of 
significance arc required. I 11 these instances, two-way interpolation of recip¬ 
rocals in the analysis of variance tables is necessary since both fti and n 2 are 
always fractional. • These difficulties make it desirable to investigate the rapidity 
with which the significance levels given by the z test approach those given by the 
X test, and thus determine the range of values of m and n for which the simpler 
test can safely be employed. This investigation will yield as a by product the 
.05 and .01 significance values of Xr (or W or S ) for selected values of m and n as 
determined by the z test. 

Table I presents a summary comparison of the values of Xr at the .05 and .01 
levels of significance as shown by (1) exact distributions, (2) the z test with 
continuity corrections, (3) the x 2 test. 3 The significance values are expressed in 
terms of x? rathor than W because, for a given number of ranks per sot (i.e,, a 
given n), the significance values given by the x 2 test arc the same regardless of the 
number of sets of ranks (i.e., of the value of m). This would not be so if W 
were employed, since W = x,/m (n — 1). The expected value of W depends on 


a The values of xr computed using the z test that are given in Tables I and II were ob¬ 
tained with the aid of Fisher and Yates’ Table V [4). Linear interpolation of reciprocals 
was employed throughout, 
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m and approaches zero asm-> “ while the expected value of x 2 is equal to n — 1 
for all values of m. 

The values given by the z test agree remarkably well with the exact values. 
With but two exceptions (the 01 values for n = 3, m = 8 and 10) the exact 
value differs very much less from the value given by the z test than from the 
value given by the x 2 test. In all but three of the 12 comparisons, the z test 
gives a value below the correct one 4 

TABLE I 


Comparison of Values of x 2 at OB and .01 Levels of Significance Yielded by Exact 
Distributions , z Test with Continuity Corrections, and x 2 Test 




.05 Level of Significance 

01 Level of Significance 



From Exact 

From 


From Exact 

From 




Distribution 

z test 


Distribution 

z test 


n 

m 



with 

conti¬ 

nuity 

correc- 




with 

conti¬ 

nuity 

correc- 


Limits 

In¬ 

terpo¬ 

lated 

From 
X 5 test 

Limits 

In¬ 

terpo¬ 

lated 

From 
X 2 test 




value* 

tions 



value* 

tions 


3 

8 

5.25-6.25 

6.16 

6.012 

5.991 


9.00 

8.35 ! 

9.21 


9 

6.0 -6.22 

6 17 

6.004 

5.991 


8.67 

8.44 

9.21 


10 

5.6 -6.2 

6.08 

5.999 

: 5.991 

8.6 - 9.6 

9.04 

8.51 

9.21 


00 



5.991 

5.991 



9.21 

9.21 

4 

4 

7.5 -7.8 

7.54 

7.43 

7 82 

9.3 - 9.6 

9.42 

9.21 

11.34 


5 

7.32-7.8 

7.54 

7.52 

7.82 

9.72- 9.96 

9.87 

9.66 

11.34 


6 

7.4-7.6 

7.49 

7.57 

7.82 


10.00 

9.95 

11.34 


CO 



7.82 

7 82 



11.34 

11.34 

5 

3 

8.27-8.53 

8.41 

8.59 

9.49 

9.87-10.13 

10.05 

10.08 

13.28 


OO 



9 49 

9.49 



13.28 

13.28 


* Computed by linear interpolation of probabilities. 


Table II gives for a very much larger number of values of m and n the .05 
and .01 values of x r computed on the basis of the z test with continuity cofrec- 


4 These comparisons duplicate some of those made by Kendall and Smith and merely 
serve to confirm their conclusion that the t test with continuity corrections gives exceed¬ 
ingly good results 

e values obtained using the t test without continuity corrections agree less well with 
the exact values than those obtained with the aid of the continuity corrections However 

T ec f ons are made the * test in general y ields vftIues °i° ser to the 

exact values than does the x 2 test. v 
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TABLE II 


Values of x^ eil 05 and 01 Levels of Significance Computed on the Basis of Kendall 
and Smith’s z test, with Continuity Corrections; 10, .075, .02, .015 Values of x 2 


n 


m 


3 


7 


1 

Values 

1 

at 05 Level 

of Significance 

3 



8.59 

9.90 

11 24 

4 


7.43 

8.84 

10.24 

11.62 

5 


7.52 

8.98 

10.42 

11.84 

6 


7.57 

9.08 

10.54 

11.97 

8 

6.012 

7.63 

9.18 

10.68 

12.14 

10 

5.999 

7.67 

9.25 

10.76 

12.23 

15 

5.985 

7.72 

9.33 

10 87 

12.36 

20 

5.983 

7.74 

9.37 

10.92 

12.42 

100 

5.987 

7.80 

9.46 

11.04 

12 56 

CO 

5.991 

7 82 

9.49 

11.07 

12.59 

X 2 (.10) 

4.605 

6.25 

7.78 

9.24 

10.64 

x 2 (.075)* 

5.18 

6.90 

8.49 

10.00 

11.45 


Values at .01 Level of Significance 


3 

4 


9.21 

10.08 

10.93 

11.69 

12.59 

13.26 

14.19 

5 


9.66 

11.42 

13.11 

14.74 

6 


9.95 

11.74 

13.45 

15.09 

8 

8.35 

10.31 

12.13 

13.87 / 

15.53 

10 

8.51 

10.52 

12.37 

14.11 

15.79 

15 

8.74 

10.79 

12.67 

14.44 

16.14 

20 

8.85 

10.93 

12.82 

14.60 

16.31 

100 

9.14 

11.26 

13.19 

14.99 

16.71 

00 

9.21 

11.34 

13.28 

15.09 

16.81 

X 2 (.02) 

7.82 

9.84 

11.67. 

13.39 

15.03 

X 2 (.015)* 

8.40 

10.46 

12.34 

14.09 

15.77 


* Computed from Fisher and Yates’ Table IV (4) by linear interpolation between the 
logarithms of the probabilities. 
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tions, The values entered for m = « are obtained from y 2 tables for n — 1 
degrees of freedom and are the significance values by the x 2 test for all values of 
m , It is apparent that as m increases the 01 and .05 values of x? approach their 
limiting values very rapidly For n = 7, two-thirds of the difference between 
the 05 values for m = 3 and m = °°, and an even larger proportion of the 
difference between the .01 values, disappears by the time m = 10; and the 
situation is similar for the other values of n. Except for the .05 values for n = 3 
thq approach to the limit is monotonic from below. The use of the x 2 test thus 
tends to lead to the overestimation of the significance values and of the probabili¬ 
ties attached to observed values of y 2 ■ It is clear, however, that for large and 
even moderate values of m the y 2 test is, for all practical purposes, equivalent 
to the 2 test 


In order to determine more precisely the range of values of m and n for which 
the approximation given by the y 2 test is adequate, it is necessary to adopt some 
convention about the error in estimated significance values of y 2 that is tolerable. 
Since the conclusion drawn from an observed x\ depends on the probability 
that it will be exceeded by chance, this convention clearly should be expressed in 
terms of the error in the probability. 

The structure of pubhshod y 2 tables makes it convenient to accept an estimated 
probability between .10 and .05 as a tolerable approximation to a correct prob¬ 
ability of 05, and an estimated probability between .02 and .01 as a tolerable 
approximation to a correct probability of, .01. These ranges of tolerance are 
entirely on one side of the correct probability because, as pointed out above, the 
error in lining the x test i* con-Ulent in direction. These ranges are purely 
arbitrary, oi' cour-e, and many may think them too broad. 

On the basis of iln.- oi some similar convention it is possible to make objective 
statements concerning die range of values of m and n for which the x test is 
adequate ^ I he next to the last line in the first section of Table II gives the .10 
values.of x ; the next to the last line m the second section, the .02 values. All 
the ,0o values oi y- shown in the table exceed the .10 value of y 2 . Using the y 
rest, all of the \ aluc-s it h two exceptions for n = 3) would signify a probability 
gioater than ()o but, less than .10 Thus the error made at the .05 level is 
within the admissible range m cording to the suggested convention. The y 2 
Test is therefore an adequate .substitute for the 2 test at the .05 level for all 

'T? 0J m !ind n Z ept p0Sbibly for “ few of lhe values for which exact dis¬ 
tributions are available 

As might be expected, the y 2 test is less satisfactory at the ,01 level For 
values of * less than six, the .01 values of y 2 computed using the 2 test with 

f; y c °™ctions are less than the .02 value of y 2 . For m greater than 5, 

butkssthan 02-f th a11 be aC °° rded a Probability greater than .01 

0 tl ue Of J L L St Were employed - As alread ^ notod -iB the range 

used [2 P : 695] h ° 0nginal paper jested the y 2 test could validly be 

In view of the arbitrary nature of the convention as to the permissible error 
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in the probability attached to an observed value of Xr, it is interesting to in¬ 
vestigate the effect of an alternative and stricter convention, namely, that only 
probabilities from .075 to .05 and from .015 to .01 be accepted as approximations 
to correct probabilities of ,05 and .01 respectively. The .075 and .015 values of 
X 2 are given in the last lines of the two sections of Table II. On the basis of this 
convention the x 2 test is adequate at the .05 level for rn greater than three, and 

TABLE III 


Values of S at .05 and .01 Levels of Significance Computed on the Basis of Kendall 
and Smith’s z test, vrith Continuity Corrections 


m 

n 

Additional values for 
n = 3 


3 

4 

5 

6 


m | S 


Values at . 05 Level of Significance 


3 



64.4 


157.3 

9« 

64.0 

4 


49.5 

88.4 

143.3 


12 

71.9 

5 


62.6 

112.3 

182.4 

276.2 

14 

83.8 

6 


75.7 

136.1 


335.2 

16 

95.8 

8 

48.1 

101.7 

183.7 


453.1 

18 

107.7 

10 

60.0 

127.8 

231.2 

376.7 




15 

89.8 

192.9 

349.8 


864.9 



20 

119.7 

258.0 

468.5 

764.4 

1158.7 




Values at .01 Level of Significance 


3 



75.6 

122,8 

185.6 

9 

75.9 

4‘ 


61.4 


176.2 

265.0 

12 

103.5 

5 


80.5 

142.8 

229,4 

343.8 

14 

121.9 

6 


99.5 

176.1 

282.4 

422.6 

16 

140.2 

8 

66.8 

137.4 

242.7 

388.3 

579.9 

18 

158.6 

10 

85.1 

175.3 


494.0 

737.0 



15 

131.0 

269.8 

475.2 

758.2 

1129.5 



20 

177.0 

364.2 

841.2 

1022.2 

1521.9 




at the .01 level for m greater than nine, except possibly for a few of the values 
for which exact distributions are available. Thus even so drastic a lowering of 
the permissible margin of error as halving it limits only slightly the range of 
values of m for which the x* test is adequate. 

Table II provides, of course, a direct means of testing the significance of 
observed values of Xr for the tabled values of m and n, For this purpose, how¬ 
ever, Table III, giving the significance values of S is more useful, since it obviates 
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the necessity of converting S into • For » = 3 Table III includes a few 
values of n in addition to those in Table II. 

SUMMARY 

The preceding analysis suggests that the x test of the significance of x? 
(or W or nl), while less accurate than the z test proposed by Kendall and Smith, 
is adequate for practical purposes at the ,01 level of significance if the number of 
sets of ranks (m) is greater than 5; and at the .05 level for any number of sets of 
ranks, provided the number of ranks in each set («) is more than 3. Exact 
distributions are now available for n = 3, m = 3 to 10; n = 4, m = 3 to 6; 
n = 5, m - 3 [i] The .05 and .01 values of xl and S, computed using the 
Kendall and Smith 2 test with continuity corrections, are given in Tables II 
and III of the present note for n = 3 to 7 and selected values of m from 3 to 100. 
For n greater than 7 and m less than 6, the z test with continuity corrections 
should be employed. For all other combinations of n and m not covered by the 
exact distributions or by Tables II and III, the x test is adequate, 
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NOTES 

This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


NOTE ON AN APPROXIMATE FORMULA FOR THE SIGNIFICANCE 

LEVELS OF Z 


By W. G. Cochran 


1. Introduction. An important part has been played in modem statistical 

2 

analysis by the distribution of z = ^ log -j, when sj and p si are two independent 

Si 

estimates of the same variance. In particular, all tests of significance in the 
analysis of variance and in multiple regression problems are based on this 
distribution. Complete tabulation of the frequency distribution of z is a heavy 
task, because the distribution is a two-parameter one, the parameters being the 
number of degrees of freedom, % and n 2 in the estimates sf and si. Thus each 
significance level of z requires a separate two-way table. Fisher constructed a 
table of the 5 percent points in 1925 [1], and this has since been extended by 
several workers [2] to the 20,1, and 0.1 percent level for a somewhat wider range 
of values of n x and n 2 . 

With his original table, Fisher gave an approximate formula for the 5 percent 
values of z, for high values of n 2 and n 2 outside the limits of his table. The 
formula reads: 

(1) z (5 percent) = 1 ) 6 ML - 0.7843 (- - - 

y/h - 1 \«i n a 

where r = — d—. 
h m ni 


The constant 1.6449 is the 5 percent significance level for a single tail of the nor¬ 
mal distribution, and the constant 0 7843 will be found to be -g-{2 + (1.6449) 2 ). 
Thus the general formula for the significance levels of z derivable from (1) is 


z 


X 

y/h — I 



where m is a normal deviate with unit standard error. By inserting the appro¬ 
priate significance level of x, this formula has been extended [2] to the tables of 
the 20, 1, and 0.1 percent levels of z and commonly appears with all published 
tables of z. The objects of this note are to indicate the derivation of the 
formula and to suggest an improvement upon it in the latter cases. 
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2. The transformation of the 2 -distribution to normality. For high values 
of ni and « 2 , the distribution of z approaches the normal distribution, the 
principal deviation being a slight skewness introduced by the inequality of m 
and rh • It is therefore natural to seek an approximate formula for the distri¬ 
bution of 2 by examining its relation to the normal distribution. For the 
2 -distribution the ratio K r //ts /2 , where k, is the r th oumulant, is of the order 
n~ (ir ~ 1) , where n is the smaller of n L and . Tins property is common to a large 
number of distributions which tend to normality; for example, the distribution 
of the mean of a sample of size n from any distribution with finite cumulants 
Fisher and Cornish [3] have recently given a method, applicable to all distribu¬ 
tions with this property, for transforming the distribution to a normal distri¬ 
bution to any desired order of approximation. They also obtained explicit 
expressions for the significance levels of the original distribution in terms of the 
significance levels of the normal distribution, discussing the 2-distribution as a 
particular example The relation between z and the normal deviate x at the 
same level of probability was found to be 


(2) * = JL - K* 2 + 


to(l _ A , _L R+ ?* , *1+ lla u(l_ 
V fli/Vil 12 h ^ 144 U nj 


the three terms on the right hand side being respectively of order n~\ n~\ and 
n~\ so that terms of order n~ i are neglected. 1 

If this equation is compared with equation (1), the latter appears at first 
sight to be the ap proxim ation of order n -1 to the 2 -distribution, except that the 
divisor of x is y/h — 1 in (1) and y/h in (2). Computation of a few values 
shows that at the 5 percent level, equation (1) is the better approximation. For 
example, for ni = 40, n 2 = 60, (1) gives z (5 percent) = .2334, (2) gives .2309, 
and the exact value is ,2332. 

Since 


y/h- 1 


x x 

Vh + 2 hVh 


+ terms of order n 


-2 


Fisher’s approximation differs from (2) by including a correction term of order 
n \ Inspection of the true correction terms of this order in equation (2) shows 


that for finite values of w t and n 2 the term 


x + llx 

T4T 


ably smaller than the term 


s 3 + 3a: 

my/% 


VS(---Y i 

\ni 7i2/ 


is consider- 


, since the former lias a smaller numerical 


« • 11 
coefficient and involves the difference between — and —. Thus Fisher’s 

Til «2 

formula gives a close approximation to the true formula of order n~\ provided 
that ^ is approximately equal to — ~j^T~ i 1 * e - h -—^ is approximately equal 

'Fisher and Cornish also gave the two succeeding terms. 
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a; + 3 

to 1. For the 5 percent level, x = 1.6449, and —-— = 0 951 Thus at the 

5 percent level the use of y/h — 1 in (1) instead of y/h extends the validity of 
Fisher’s approximation from order n~ l to order 
This ingenious device, however, requires adjustment at other levels of sig¬ 
nificance. The values of*(a; 2 + 3)/6 at the principal significance levels are 
shown below. 


Significance level—% 

40 

30 

20 

10 

5 

1 

0.1 

X = (x 2 + 3)/6 

0.51 

0.55 

0.62 

0,77 

0.95 

1.40 

2.09 


jf y'ft _1 in formula (1) is replaced by \/h — X, with the above values of X, 
Fisher’s formula will be approximately valid to order n -! at all levels of signifi¬ 
cance In particular, for the tables already published of the 20, 1 and 0.1 
percent points, X may be taken as 0.6, 1.4 and 2.1 respectively. The values of z 
given by the use of y/h — 1 and y/h — X aie compared below for n x = 24, 
n 2 = 60. 2 


Significance Level 

Approximate formula 

Exact value 

! i 

i 

> 

y/h — X 

20% 

.1346 

.1337 

.1338 

1% 

.3723 

.3748 

.3746 

o.i% 

.4875 

.4966 

.4955 


The use of y/h — X gives values practically correct to 4 decimal places* 
except for the 0.1 level of significance, at whicli the higher terms become more 
important 

With the aid of this formula, complete tabulation of the z-distribution for a 
given pair of high values of n x and n 2 is relatively simple. If very low proba¬ 
bilities at the tails arc required, the further approximations given by Fisher and 
Cornish [3] may be used. 
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2 The numeric,al terms in tlio approximate formula given for the 20 percent points on.p.28 
of Fisher and Yates’ Statistical Tables are in error. Their formula Bhould read: 


0.8416 

\4 - 1 


- 0,4514 


(l.i) 

\n< ntj 
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A NOTE ON THE ANALYSIS OF VARIANCE WITH UNEQUAL CLASS 

FREQUENCIES 1 

B. Abraham Wald 2 

Let us consider p groups of variates and denote by m, (j = 1 , ■ • • , p) the 
number of elements in the j-th group. Let x„ be the t-lh element of the j-th 
group. We assume that x„ is the sum of two variates u, and Vi , i.e. x ( , = 
*i, + ij,, where «„ (t =» 1 , ■ • • , m, ; j = 1 , ■ • • , p) is normally distributed with 
mean p and variance cr 2 , and ij, (j = 1 , • - , p) is normally distributed with 
mean // and variance a' 2 , All the variates e,-, and 17 , are supposed to be dis¬ 
tributed independently. 

The intraclass correlation p is given by 3 


, P = 


<j 2 - f - & 


jf 


Confidence limits for p have been derived only in case of equal class frequencies, 
i.e. m-L - mi = •. = m p . In this paper we shall deal with the problem of 
determining the confidence limits for p in the case of unequal class frequencies, 

a' 2 

Since p is a monotonic function of —, our problem is solved if we derive confi¬ 


dence limits for — 


Denote by x , the arithmetic mean of the j-th group, i.e. 


( 1 ) 


2^ m 


X, = — 


m, 


+ m- 


Hence the variance of it, is equal to 


( 2 ) 


2 <7 , li 

<r* ( = — + <r • 

nt, 


Denote — by X 2 . Then we have 


(3) 


2 _ 2 ( 1 , , 2 \ a 2 
os, = 17 l h X = —, 

Vy / v>j 


'The author is indebted to Professor H. Hotelling for formulating the problem dealt 
with in this paper. 

2 Research under a grant-in-aid from the Carnegie Corporation at New York 
J See for instance R. A Fisher, Stalistical Methods for Research Workers, 6-th edition, 

p. 228. 
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wlieic 

(4) 



Now wc shall prove that 



has the x 2 -distribution with p — 1 degrees of ficedom. 


Let 


v> ^ Vw,'h 0 = i, ,p) 

and consider the orthogonal transformation 
yi = L 1(7/1, •••,2/p), 


2/l_i = L P -i(yi, • • • , 2/p), 

v wi + ■ ■ • + 

where £1(2/1 , • • • , 2 /p), • ■ , L„- 1(2/1 , , ?/,,) denote arbitrary homogenous 

linear functions subject to the only condition that the transformation should 
be orthogonal. 

Since the mean value of ?/, is equal to \Zw,- ('m + /) and the variance of y , 
is equal to <r 2 , wc obviously have: The mean value of 1 /; O' = 1, ,p — 1) 
is equal to zero, the variance of y\ (j = !,-••, p) is equal to a 1 . In order to 
prove our statement, we have only to show that the expression (5) is equal to 

-j {y'i + ■ • • + y'v- 1 ). If wc substitute in (5) for x ,, wc get 

0 y/Wj 



- (y'i + • • • + i/p-i). 
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22 (x _ x 

Since —- ■—-- has the x distribution with N — p degrees of freedom, the 


expression 


( 6 ) 


= N ~l> , j%') 

' P -1 S2(%, - z,) 2 


has the analysis of variance distribution witli p — 1 and N — p degrees of 
freedom, where N - mi + ■ ■ + m „. In case rtii = mi = •. . = m p = m 
we have 

( 6 ') F = tl S C< ' ~ f)< » _ 1 

' p - 1 2S(I, ( - S,)’ 1 + mX' 1 + mX ! 

where i - “*■ and F* - "gfe" *>' 

N p- 1 SS(x, v - $,)* 

Hence 


F*. 


VF Jm 


If Fi denotes the lower and Fi the upper confidence limit of F, wo obtain for X 2 
the confidence limits 


(S-Oi - (*->)=• 


Let us now consider the general case that m x , ■ • • ,m p are arbitrary positive 
integers. First we shall show that the set of values of X 2 , for which ( 6 ) lies 
between its confidence limits Fi and F %, is an interval. For this purpose we 
have only to show that 


,is monotonically decreasing with X 2 , In fact 

m 1 ) 

c 

Since 


we have 


§{ 4 - 

ling with X 2 . In fact 

$ - 5 3 (* - 3 ?)‘ - 2 £ ”• (*■ - £?)]• 


d/(X 2 ) dw, ( XwjxX -A , / Sui.aA 2 

<0 ' 

which proves our statement. 



ANALYSIS OF VARIANCE 


99 


Hence the lower confidence limit A 2 of A 2 is given by the root of the equa- 

,■ • \ 2 . 
tion in A . 



and the upper confidence limit A? of A 2 is given by the root of the equation in A 2 : 


( 8 ) F = F x. 

Since /(A 2 ) is monotonically decreasing, the equations (7) and ( 8 ) have at 
most one root in A 2 . If the equation (7) or ( 8 ) has no root, the concsponding 
confidence limit has to be put equal to zero. If neither (7) nor ( 8 ) has a root, 
wo have to reject at least one of the hypotheses. 

(1) x„ = «,j + Vi 

(2) The variates and rj, (i = 1, , m, ; j = 1, • • • , p) are normally and 

independently distributed. 

(3) Each of the variates e„ has the same distribution. 

(4) Each of the variates ij; has the same distribution 

The equations (7) and ( 8 ) are complicated algebraic equations in A 2 For 
the actual calculation of the roots of these equations, well known approximation 
methods can be applied making use also of the fact that the left members are 
monotonic functions of A 2 . In applying any approximation method it is very 
useful to start with two limits of the root which do not lie far apart. We shall 
give here a method of finding such limits. 

Denote by F the function which we obtain from F (formula ( 6 )) by substi¬ 
tuting 

- TT^a 2forWj (i = i> •■•,?)■ 

Let / be the function obtained from / by the same process 
Denote by <p{m, A 2 ) the function which we obtain from F by substituting m 
for k , ■ ■ , l v We shall first show that F is non-decreasing with increasing 
0 F 

h (k = 1, • ■ , p), i.c. - r > 0. For this purpose wo have only to show that 
dlk 

^> 0 . We have: 
oik 



Hence our statement is proved. Denote by m' the smallest and by rn" the 
greatest of the values «i, • ■ ■ , m p . Then we obviously have 
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(9) ¥>(>«', X s ) < p < <pi.ni", X 2 ). 

Denote by Xj' 2 , x" 2 , \'i l t Xo' 2 the roots in X 2 of the following equations respectively; 
<p(m', X 2 ) - F 2 ; 

<p(m", X 2 ) = F ,; 

P (m', X 2 ) = ; *,(«", X 2 ) = ft. 

Since F is monotonically decreasing with increasing X 8 , on account of (7), (8), 
and (9) we obviously have 

X? < Xi < x (' 2 
and 

X 2 2 < X* < Xs' 2 . 

The above inequalities give us the required limits. 

Columbia University, 

New York, N Y. 


THE DISTRIBUTION OF QUADRATIC FORMS IN NON-CENTRAL 
NORMAL RANDOM VARIABLES 

1 By William G. Madow 1 

The following theorem is the algebraic basis of the theorem of R A. Fisher 
and W. G. Cochran which states necessaiy and sufficient conditions that a set 
of quadratic forms in normally and independently distributed random variables 
should themselves be independently distributed in x 2 -distributions . 2 

Theorem I. If the real quadratic forms q x , ■ • , q m > in ,••• , x„ , are 
such that 

(!) £ Qi = £ xl, 

y v 

and if the, rank of q y is n y , then a necessary and sufficient condition that 

_ _ a 

1 The letters i, j, n, v will assume all integral values from 1 through n, the letter y will 

assume all integral values from 1 through m, (n > m), the lotter a will assume all integral 
values from m + • • + n T _ , + 1 through n t + • ■ • + n T , (n 0 = 0 , ni + • ■ • + n m <= n 1 ), 
the letters p, p will assume all integral values from 1 through n', and the letters r, ® will 
assume all integral values from 1 through n - 1 . 1 

2 The references are, W. G Cochran, “The Distribution of Quadratic Forms in a Normal 
bystem, with Applications to the Analysis of Covariance,” Proc. Carnb. Phil. Soc., Vol. 

. G934), pp, 178-191, and R. A Fisher, “Applications of ‘Student’s’ Distribution,” 
Metron, Vol 5 (1926), pp 90-104. 



DISTRIBUTION OP QUADRATIC FORMS 


101 


where the real linear functions zp of the x, are defined by 

( 3 ) x > = 2 We 

is 

( 4 ) n' = n. 

Furthermore the system of linear forms (3) constitute an orthogonal transformation. 

Proof; Necessity. Since the rank of a sum of quadratic forms is less than 
or equal to the sum of their ranks, it follows that n' > n Upon substituting 
from (3) for the re’s in (1), and using (2), it is seen that, for all values of the z’s, 

Zfl = ^ 0ypC v pftj ZpZp, 

0 M' » 

and hence, from (1), it follows that 

( 5 ) 2 CyfiCyf)' = 

where <V = 0, if ft ft', and = 1 if ft = ft' However, since the rank 
of the system of linear forms (3) is not greater than n, and since the matrix 
of (5) is the product of the matrix of (3) by its transposed matrix, it follows 
that (5) can be tiue only if n' is not greater than n. Consequently n' = n. 
It then is an immediate result of (5) that the transformation (3) is orthogonal. 

Sufficiency. Wo assume that n' = n. By a real linear transformation of 
x l , • . , x n we obtain linear forms z, such that 

OS' ~ 2 C a Z cr , 

a 

where c a = 1 or — 1. The set of linear functions z x , • , z» are linearly inde¬ 

pendent, for if z„ 0, and if real numbers h,, ■ , h „-i not all zero, exist such 
that, say, 

Zn = hfZ r 
r 

then 

2«5 = 2 H,Z r Z.. 

v r,a 

Substituting, we liave 

2 Qt - 2 = 22 N ri c rii c >v x^Xy 

7 v r,# p,v 

where z v - 2 . (It is not assumed here that the matrix of the cT is the 

inverse of the matrix of the c ?,. That fact is a consequence of this proof.) 
Denoting the matrix of Z \, • • • , z n -1 by C n we see that the matrix of 2 <7 t is 

7 

C'jICn where H is the matrix of the Hr, and has rank less than or equal to ft — 1 
which contradicts the hypothesis. Hence if C is the matrix having the elements 



102 


WILLIAM G, MADOW 


c„ in its main diagonal and zeros elsewhere and if C n is the matrix of z 1 , ... z 
it follows that ’ ’ 

C' n CC n = I, 

where I is the identity matrix, i.e. the matrix having ones in the main diagonal 
and zeros elsewhere and C n non-singular Then C = CZ l, CZ l and hence C is 
the identity matrix and C n is orthogonal 
Among the hypotheses of the Fishcr-Cochran theorem is the hypothesis that 
the mean value of is 0 , and the variance of x h is <r 2 However, in connection 
with his analysis of the distribution of the multiple con elation coefficient 3 
R A Fisher derived the distribution of the sum of the squares of n independently 
distributed random variables xi , ••,»„, the probability density of x„ being 
given by 

1 vM - (27r<r 2 r* exp r~ ~ (x,, - a „) 2 , 

More recently, P. C. Tang , 4 has used the distribution of the sum of non-central 
squares in his study of the power function of the analysis of variance test 
In this note we extend the Fisher-Cochran theorem to non-central random 
variables. If the random variables x,, are independently distributed with 
probability densities given by (6), Fisher and Tang have shown that if x ' 2 = 

~ 2 2 i then the probability density of x l is given by 


(7) 


p(x fi ) 


¥ 




(¥x' 2 y 

vl r(*n + v) ’ 


where X = E a 2 . 

We now give necessary and sufficient conditions that a set of quadratic forms 
in normally and independently distributed random variables should themselves 
be independently distributed in X , 2 -distributions. 

TmcoBEM II. Let x lt ... , Xn be independently distributed random variables, 
the random variable x, having probability density ( 6 ). Denote 2 xl by q, and 

1 y 
denote ~ 2 a, by A. Let q, , ■ ■ , q m , be quadratic forms, 

Qy O'nv Xy 

such that S q y = q, and let the rank of q y be denoted by n y . 


oienf» P™ p 0 ’, Jc Tt Sampling Distribution of the Multiple Correlation Coeffi- 
* P p ° S L l nd ° n> W > Y ° l 121 < 1938 )’ PP 664—673. 

Illustrations ft tbe Analysis of Variance Tests with Tables and 

Use, .Statistical Research Memoirs , Vol, 2 (1038), pp. 126-149. 
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A necessary and sufficient condition that the quadratic forms x'y, (x'y = 
be independently distributed with joint probability density 

( 8 ) P(xi 2 , ■■■,xL 2 ) = U p( x 'J), 

V 

where p(x'y) «s given by (7) -with n y and X 7 in place of n and X, and 


(9) 


'by — 2~i 23 Oy2 fl«0 




Vfiy Up, Uiii 


is n 1 = n. 

Proof. Necessity. Tang 6 has shown that the distribution of x ' 2 is given 
by (7) and that if the x 7 2 have joint distribution (8), then the distribution of 
Xi + ' d" Xmj (— x ), is (7) with n 1 in place of n. Upon comparing terms, 
we see that n' = n 

Sufficiency. By Theorem I there exist n orthogonal linear functions (3) such 
that (2) is true Then it is easy to see that the random variables z x , ■ ■ , z n 
are independently distributed with a joint probability density 

(1°) P( z i. • • ■ , Zn) = (2jr<r 2 ) _in exp [~i 2 {*» ~ alY], 

V 

where 

23 a', 2 => 23 al, and a'„ = 23 c„ r a„. 

► ► y 

If we set 2<r 2 x 7 = 23 a'j , then we have, from (7) and (10), that the X y are 

independently distributed with joint probability density (8). It is only neces¬ 
sary to show that 23 fl« — 23 a^Va^a, in order to complete the proof of the 

a fi t y 1 

theorem. Now 

23 RpJ a* = 23 ( 23 fly^'cjyC/,) a, a!. 

»,V 

On the other hand, by direct substitution for the z’s we see that 

= 23 zl = 23 (23 C„aC, a ) XpX, 

a w Ot 

and hence a = 2 c Ma c* a . Since (1) is an orthogonal transformation, 


^ CipCjy X/ ( CpaCva) CipCjy — ^ j 

where & ai — 0, if « ^ i and — 1 if a - i, which completes the proof. 

It is emphasized that the form of X r makes it unnecessary to calculate the 
matrix of q y to determine X 7 since the values a, need only be substituted for the 
in original expression for q y to determine X T . 

Washington, D. C. 


‘ See 4 p 140. 
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TWO PROPERTIES OF SUFFICIENT STATISTICS 


By Louis Olshevsky 


The concept of sufficient statistics was introduced by R A. Fisher in 1922, 
It was refined and extended in 1936 by Neyman and Pearson who gave defini¬ 
tions of shared sufficient statistics and sufficient sets pf algebraically independent 
statistics 1 Today the concept plays an important part in the theory of the 
subject. Characterized briefly, a statistic associated with a single or specific 
population parameter is sufficient when no other statistic calculated from the 
same sample sheds any additional light on the value of the parameter We 
shall prove that sets of sufficient statistics possess certain interconnections so 
that when one set is known every other set with a like number of members and 
linked with the same population parameters is discoverable. 

Theorem 1. If T lt • ■ , T m are a set of m (m A n) algebraically independent 
sufficient statistics with regard to the parameters 8i, ■ ■ , 8 t and the probability law 
p(xi , • • , x n | 0 i, • • , 0 ,, • , 6 ; ), a necessary and sufficient condition for the 

sufficiency of any set of m algebraically independent statistics T[ , . , T' m with 
regard to the same parameters and the same probability distribution is that the T[ 
be a set of independent functions of the T, (i, j = 1, ■ ■ • , m) 

Proof 1 As an adjunct in the demonstration we cite the following theorem 
due to Neyman 2 For a set of algebraically independent statistics I\ , . , T m 

to be a sufficient set with regard to the parameters &i , ■ ■ , (),,, it is necessary 
and sufficient that in any point of sample space, except perhaps for a set of 
measure zero, it should be possible to present the probability law in the form 
of the product 


(1) ^ Xl ’ '' ’ Xn I ^ ’ '' 1 ’ ‘ ! ®i) 

~ 1 I ' ) Fm \ , Of) -<f >(Xi , • ' ■ , Xn ] 0g+l j • , 61 ) 

where p(2\ , • , T m \ 6 X , ■ ■ • , 9 q ) is the probability law of T t , • ■ , T m and 

the function 0 does not depend upon , •. , d Q . 

The sufficiency of the condition stated in the hypothesis of Theorem I is now 


immediately evident. For, if p' and refer to the second set of algebraically 
independent statistics and T\ = T t (Ti, - • , T n ) where the functions are inde¬ 
pendent, the relations can be solved for the i', in terms of the T’ giving 
T ’ = T,(T {, ■ ■ , T’ m ), p'('l[T' m | e q ) 

= V[UT [, , I’D ,.., , T m {T[ , • .,T' m )\g 1 , . , 

0\J. 1 , • • ■ , 1 m) 


See Neyman and Pearson- “Sufficient Statistics and Uniformly Most Powerful Tests 
°L« Z t,Cftl H yP° these0 ’” Statistical Research Memoirs of the University of London, June 
1936 The notatian of the present paper is taken from this article. 

m 8 , ee /ino e J man S „ artl0le ln the Gwrnale del1 ’ Insituto Italiano degh Ait-uan, Vol. VI, 
JNo. 4 (1935) as well as the memoir referred to in footnote 1. 
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and 

p(x i, ■ • ■ , Xn | 61 , • • ■ , 0 « i • ■ i Bi) 

= P'(T 1 J ■ ■ ) T f m | 01 , , e q ) 4>'(x 1 , ■ - , Xn ; Qq+l , ■ ■■ , 8 l ). 

PLoof of the necessity is somewhat more involved Since the, T, and f[ are 
both sets of algebraically independent statistics with regard to di, ■ • • , 0 ,,, 
equations (1) and (2) are satisfied They are, in fact, identities when the 
values of • , T m and 1\ , - , T m in terms of the Xi are substituted.. 

Division of (1) by (2) and multiplication leads to the equation 

p {T\, ■ • • ) I’ m 1 All • • • ) 8q) — (%1 , • • * [ %n 1 8q+1 ) • • • , 0|) 

^ p'iT'l, Tmlfll, >0l) 

The right side of (3) is free of Q\ , ■ • ■ , d Q . Therefore, in reality the left side 
must be too. If some or all of the parameters 0i, • , 0, enter formally into 
the left side, we can choose m + 1 sets of values 0l , ■ • , ffj (t = 1, ■ • , m -f- 1) 
such that each of the m + 1 functions p(7\ , • • , T m \ ff\ , • , d' q ) -t- , 

• , T r m | 6[, - ■ , Oj,) differs formally from all of the others. We can, then, 

since each is equal to the right side of (3) which is free of 6 X , ■ ■ • , 0 a , equate 
any one of these functions to the remaining m in turn This provides m inde¬ 
pendent equations whose very existence proves that the J'[ are functions of the 
T , and vice versa. 

If none of the, parameters <h , • • - , 0 8 enters formally into the left side of (3), 
p(?’i, • • ■ , 2'* I fh , • , 0„) must be of the form p(l \, • • , T m )g(d i, .. , 8 „) 
and p'(f [, • , T' m \ , • ■ , 0 q ) of the form p'(f [, •.. , T' m )g(B i, ■ ■. , 6 V ). 

In this case the original probability law p(x i, • • , x n \ 8i , • • • , 0„, . ■ , 6i) 
contains 0i, . ■ • , 0„ only nominally and there, can be no talk of any statistics 
designed to estimate these paiameters either singly or in combination 

When m = 1 and the set of algebraically independent statistics reduces to 
one, the single statistic is termed a shared sufficient statistic of the parameters 
0 i, ■ • • , 0 o , 3 For this special case, Theorem I can be restated as follows If 
T is a shared sufficient statistic with regard to the population parameters 
0 i, ■ ,6 q and the probability distribution p(x i, • ■ , x n | 0i, ■ ■ ■ , 0„, ■ ■ , 0;), 
the necessary and sufficient condition for the sufficiency of any statistic T' 
with regard to The same parameters and the same probability distribution is 
that i 1 ' be a function of T. When m and q both equal one, the statistic becomes 
a sufficient statistic in the sense originally defined by Fisher in 1922. 

A physical law is independent of the coordinate system used to express it. 
This fact is taken account of in modern physics through the employment of 
tensors. One might hope for a parallel situation in the relation between suffi¬ 
cient statistics and the probability law to which they refer. Given any l 
parameter family of distribution laws p(x i , ■ • ■ , x„ [ 0 L , • • ■ , 0 ;), the substitu- 


1 See the memoir mentioned in footnote 1. 
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tion 8i = Bi(6[, ■ • • , b[) (i - 1, . • • , l) leads to the equally valid representation 
of the family 

p'(x i , ■■ ■ ,x n \ Bi, ■ • ) Oi) 

= p[xi , • • •, £„ | di(e[ , ■ • •, o\), . •, 6i(e[ , ■ ■ •, flj)] 

Is a set of statistics sufficient with respect to the first representation also suffi¬ 
cient with respect to the second? The answer is partly in the affirmative and 
is given by the following proposition. 

Theorem II. If the set of algebraically independent statistics Ti, • ■ ■ , T m is 
sufficient with regard to the parameters B x , • ■ > 0 Q and the probability law 

p{xi, • • ■ , x n | By, • ■ • , 0 9 , • ■ , Oi), it is also sufficient with regard to o[ , ■ ■ , o' q 

and any other representation p'{x x , ■ • • , x n [ B x , ■ ■ ■ , 0,, .. • , .Of) of the same 
probability law provided &[ (i = 1, .. , g) are independent functions of Ox, • , 0, 

only and d) (j => q + 1| • • , l) are functions of B q+ 1 , ■ • • , Qi only. 

Proof. The proof of the theorem is obvious. We are given the fact that 
p{xi x„ | Bi, , 6,, . ,0i ) = p{T x , - ■ • , T m \ Bi, ■■ ■ , Bf) ffi(x x , ■■ ■ , 

x„ , B q+ i, ■ , B t ). Since the flj {i = 1, • ■ , q) are functions of 0j, ■. • , 6, 

only and the b[ (j = q -f 1, . ,1) are functions of B„ H , • ■ , Oi only, it follows 

that Bt = 0,(0(, . • , (i = 1, • •. , q) and 0; = 0;(0a+i , • • ■ , Oi) (j = 
q -f 1, ■. • , l). Consequently, 

... P'(*i, ■■ ,x n \6[, ,e'i) 

= V [I 1 1 > • ■ I I'm \ Bi , • • • , Bg) , In ; 0g+l , • • 1 , 0l) 

and the theorem is established. 

New York, N. Y. 


NOTE ON THE MOMENTS OF A BINOMIALLY DISTRIBUTED VARIATE 

By W, D. Evans 

J. A. Joseph, has given two interesting triangular arrangements of numbers, 
the second of which is reproduced herewith as Table l. 1 The successive rows 
in this table are the coefficients in the expansion of a;" as a function of the fac¬ 
torials x (,) , using the notation of the calculus of finite differences. For example, 

x ■ = x w + 6x w + 7x m + x, 

where 

* (0 = x(x - l)(x - 2) •.. (x - i + 1). 

Joseph points out that the coefficients may be used to generate the numbers 
of Laplace. 


1 J. A. Joseph, 1 On the Coefficients of the Expansion of Annals of Math. Slat ., 
Vol, X (1939), p. 293. 
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A general expression defining any of the coefficients in terms of its place of 
occurrence in Table 1 may be set up. If we denote by F c (r) the number in 
row r and column c of the table, we have 

r-c+l ki k e -i 

(1) I'oM = ki 2 fe 2 kz • ‘ ko-i (t c). 

w iiii — 


This expression is of additional interest since the numbers defined by it are 
likewise the coefficients in the expression of the z-th moment about the origin 
of a binomially distributed variate in terms of the probability of the variate and 
the size of the sample in which it is contained. For example, it may be easily 


TABLE 1 



1 

2 

3 

4 

5 • • > c 

1 

1 





2 

1 

1 




3 

1 

3 

1 



4 

1 

6 

7 

1 


5 

1 

10 

25 

15 

1 

r 

Ur) 

Hr) 

Ur) 

Ur) 

Fz(r) •- F e (r) 


verified that if a is such a variate, p its probability of occurrence, and n the size 
of the sample in which it is contained, 

E(a) 2 = n (2, p 2 + rip 

E(a) 3 — n (3, p 3 + 3n (: V + np 

E(oc) A = n w p* + 6 n <3> p 3 -f 7 n l2) p 2 + np 


and so on. 

Ordinarily, computation of the higher moments of a binomially distributed 
variate is a tedious process of repeated differentiation. However, equation (1) 
immediately permits us to generalize the foregoing expressions to give the z-th 
moment of a as follows: 


( 2 ) 


E( a y - £ 

*••0 


i—» k i 

H h £ fa • • ■ h. 

i i i 


It will be noted that when c — 1 in equation (1) and t in equation (2) are equal 
to zero, the repeated summations vanish to be replaced by the value one, 

By means of equation (2) much of the labor usually involved in expressing 
the z-th moment about the origin of a binomially distributed variate in terms 
of n and p may be avoided. 


Washington, D. C, 



REPORT OF THE ANNUAL MEETING OF THE INSTITUTE 

The fifth annual meeting of the Institute of Mathematical Statistics was 
held in Philadelphia, Pennsylvania, on December 27 and 28, 1939, in conjunc¬ 
tion with the meetings of the American Statistical Association, the Econometric 
Society, and the American Sociological Society. The program for the meeting 
was arranged by Professor C. C. Craig. 

On Wednesday morning, December 27, the Institute held a session devoted to 
contributed papers on Statistical Theory and Methodology. Professor P. R, 
Rider, President of the Institute, presided. At that time the following papers 
were presented: 

1. On the unbiased character of certain likelihood-ratio tests when applied to normal 
systems 

Joseph F Daly, The Catholic University of America 

2. The product seminvanants of the mean and a central moment m samples, 

C, C Craig, University of Michigan, 

3. A method for minimizing the sum of absolute values of deviations 
Robert Singleton, Princeton Local Government Survey. 

4 On certain criteria for testing the homogeneity of k estimates of variance. 

C, Eisenhart and Frieda S Swcd, University of Wisconsin 

6 On a test whether two samples are from the same population, 

A Wald and J Wolfowifcz, Columbia University and Brooklyn, Now York, 

6 . The power Junctions of certain tests of significance m harmonic analysis and lag cor¬ 
relation. 

William G Madow, Washington, D. C. 

7 Some theoretical aspects of the use of transformations m the statistical analysis of rep¬ 
licated experiments 

W G Cochran, Iowa State College. 

8 The standard errors of geometric and harmonic types of index numbers 
Nilan Norris, Hunter College 

9 A study of R A Fisher’sz distribution and the related F distribution 
L. A Aroian, Hunter College, 

10 A note on the analysis of variance with unequal class frequencies 
Abraham Wald, Columbia University 

11 An approach to problems involving disproportionate frequencies 
Burton D Seeley, U S, Department of Labor 

Abstracts of these papers are given at the close of this report, 

Immediately following the session just described, the Institute held its annual 
business meeting. At that time President Rider announced that the newly 
elected officers for the year 1940 are: President, S S. Wilks, Princeton Uni¬ 
versity; Vice-Presidents - C, C, Craig, University of Michigan, and A. T. Craig, 
University of Iowa, Secretary-Treasurer. P. R. Rider, Washington University. 
At one o’clock on the same day, members of the Institute and their guests 

108 



REPORT OF THE ANNUAL MEETING 


109 


attended the annual luncheon. At the luncheon, Professor B, H. Camp ad¬ 
dressed the Institute on Non-standard Deviations. ■ 

On Wednesday afternoon, the Institute met jointly with the Am erican Statis¬ 
tical Association for a program devoted to Lag Effects in Statistics and Eco¬ 
nomics. Professor J. D. Tamarkin presided and at this time the following 
papers were read: 

1 Lag effects m statistics and related problems. 

A. J. Lotka, Metropolitan Life Insurance Company 

2 Some methods in the analysis of lag effects. 

H. T Davis, Northwestern University 

3. Lag effects in economics 

Charles F. Boos, Institute of Applied Econometrics, Inc 

A joint session with the Biometric Section of the American Statistical Associa¬ 
tion was held on Wednesday evening. Professor George W. Snedccor presiding. 
The papers presented at this session, which dealt with Design and Analysis of 
Replicated Experiments , were the following • 

1. Practical difficulties met m the use of experimental designs. 

A E. Brandt, Soil Conservation Service 

2 Factorial design and covariancem the biological assay of vitamin D. 

C I. Bliss, Sandusky, Ohio 

3 Combinatorial problems in the design of experiments 

Gertude M Cox, Iowa State College, 

4 Experimental trials with balanced incomplete blocks. 

W J. Youden, Boyce Thompson Institute. 

On Thursday afternoon the Institute held consecutively joint sessions with 
the American Sociological Society and the Econometric Society. At the first of 
these, Professor William F. Ogburn presided and the following program was 
presented: 

1. How the mathematician can help the sociologist. 

Samuel A. Stoufler, University of Chicago 

2. Some problems of combinations and permutations as they apply to a comprehensive 

classification of social groups 

George A. Lundberg, Bennington College 

Discussion: C C. Craig, University of Michigan. 

Philip M Houser, U S. Bureau of the Census. 

At the second session the topic for discussion was Recent Advances in Business 
Cycle Analysis and these papers were given: 

1 Recursive methods in business cycle analysis. 

Merrill M Flood, Princeton Surveys. 

2 An appreciation of some recent mathematical business cycle theories. 

Gerhard Tintner, Iowa State College 

3. The statisticians' new clothiers 

Arne Fisher, Western Union Telegraph Company, 


Paul R. Rider, Secretary. 



ABSTRACTS OP PAPERS 

(Presented on December 27,1939, at the Philadelphia meeting of the Institute) 

On the Unbiased Character of Certain Likelihood-Ratio Tests when Applied to 
Normal Systems. Joseph F. Daly, The Catholic University of America. 

Consider a random sample of N observations on a set of variates x 1 , ■ ■■ , art, where 
gi ■■■ ,x* are assumed to be normally distributed about means which are linear functions 
m' - Siji'of the fixed variates z** 1 , ,a'. One is sometimes required to decide whether 
the sample tends to contradict the further hypothesis, H 0 , that the coefficients bj belonging 
to a certain subset of the fixed variates, say x kn , , x m , have the specific values i ' h „. 
Such a situation occurs, for example, m the generalized analysis of variance. In this paper 
it is shown that the Neyman-Pearson method of the ratio of likelihoods yields a test of H e 
which is (at least locally) unbiased, in other words, this test is less likely to reject/Io when 
the sample is in fact drawn from a normal population in which = b* 0 than when it is drawn 
from a normal population in which the bj are different from but sufficiently close to f)) 0 
In the speoial cases k = I or h = 1 the proof goes through even without the restriction that 
the true bj be close to b) 0 , a result which is also implicit in the papers by P C. Tang and 
P. L. Hsu (Stat, Res Mem. Vol. 2). 

Similarly with respect to the hypothesis Hi that the deviations x' — 2b)x" fall into 
certain mutually independent sets the X-teat is at least locally unbiased; and it has the 
additional property that the expected value of any positive integral power of v X is greater 
when Hi is true than when the sample is drawn from any other normal population, 

The Product Seminvariants of the Mean and a Central Moment in Samples. 

C. C. Craig, The University of Michigan. 

The method used by the author in calculating the product seminvariants of a pair of 
central moments in samples is not adapted without modification to the present problem. 
In the present paper the necessary modification is developed which gives a routine method 
for the calculation of these sampling distribution characteristics. The calculation is a 
little hefivier than in the previous case but the results for the mean and the second, third, 
and fourth central moments are given up to the fourth order except in one case in which the 
weight ib 13, It is planned to follow this with a further study of the distribution of Fisher's 
t in samples from a normal population. 

A Method for Minimizing the Sum of Absolute Values of Deviations. Robert 
Singleton, Princeton Local Government Survey. 

E C Rhodes ( Philosophical Magazine, May 1930) presented a method for the estimation 
of parameters m a linear regression where it is desired to minimize the sum of absolute 
values of the deviations, In this paper the structure of the deviation surface is analyzed 
and a method of steepest descent is developed which for computational purposes is an 
improvement over Rhodes’ method. The process is finite and leads to an exact solution. 
The method and the formulae used are such as to permit the successive additions of new 
observations or sets of observations to the original data, or the exclusion of an observation 
from the original set, and the determination of the parameters for the sets of data so de¬ 
rived, with little additional labor. 
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On Certain Criteria for Testing the Homogeneity of k Estimates of Variance. 

C. Eisenhaht and Frieda S. Swed, University of Wisconsin. 

Given k variance estimates sj , s 3 , • • , s\ with n r s?, (r = 1, 2, ■ • • , fc), independently 

distributed as x 2 v? for n r degrees of freedom, testB of the hypothesis, Ho , that = er 2 , 
(r = 1, 2, • , k ), whore a i is unknown, have been based to date on one or the other of the 

quantities 

Qi = ZJ riAs\ - a 2 ) 3 /2a‘ 

k 

Qi = w log (na'/w) — ia r log {n r al/w r } 


where the w r are weights, w 





A. E. Brandt and 


W. L Stevens have advocated the use of <2i, referring an observed value of Qi to the x 2 
distribution for A: — 1 degrees of freedom, J Neyman, E, S. Pearson, B L. Welch, and 
M. S. Bartlett have advocated tests based on Qt , Bartlett definitely proposing the use of 
degrees of freedom as weights, i e w r = n r , and recent work of E. J G Pitman and others 
has shown that unless w T =* n r tests based on Q 2 are biased. (Astatistical test of an hypoth¬ 
esis H is said to be unbiased when the probability of rejecting H by its use is a minimum 
when H is true, obviously a desirable property,) When w, = n r Bartlett has suggested that 


the distribution of Qi can be satisfactorily approximated by referring Qi/\ 1 + 

■CfcH) 


3(fc - 1) 

to the x 1 distribution for k — 1 degrees of freedom. In this paper we discuss 


the adequacy of the x 3 distribution to describe the distribution of Q i and of the adjusted 
Qi when the degrees of freedom, n, , are small 

U. S. Nair and D, J. Bishop have given theoretical evidence whioh suggests that when 
n r > 2, (r = 1, 2, • • • , k), Bartlett’s adjusted Q 3 may be expected to conform to the x* 
distribution reasonably well in the neighborhood of the 5% and 1% levels. Using 1000 
samples of 4 for which n r sj/(n r+ i) has been tabulated by W. A. Shewlrart in Table D, Ap¬ 
pendix II of hie “Economic Control of Quality of Manufactured Product,” 200 values of 
<2i and Q 3 (with adjustment)* were calculated and compared with the x 3 distribution for 
k — 1 degrees of freedom. Two cases were studied: CaBe I, k = 5 and m =» ns = • ■ • = 3; 
Case II, k = 3 and «i = n 2 = 3 while n t = 9. As measured by the Chi-Square Goodness of 
Fit Test, using 11 degrees of freedom, the fits were good in all four instances. In Case I, 
for Bartlett’s adjusted Q s the test led to .80 < P < .90, and to .70 < P < .80 for the Brandt- 
Stevens Qi ; in Case II, the fits were poorer with 60 < P < 70 for Bartlett’s criterion and 
10 < P < .20 for the Brandt-Stevens However, an examination of the descending cumula¬ 
tive distributions showed that in all instances these criteria exhibited a deficiency of large 
values of x 3 , with the deficiency, in general, more marked in the case of the Brandt-Stevens 
test. Consequently, when one uses significance levels for these criteria obtained by means 
of the x 2 approximation advocated, one is in reality using a level of significance slightly 
less than that professed. The discrepancy is not great, however, and is on the safe side, i.e. 
one will rej eot Ho falsely in the long run less often than one professes to be doing. Without 
doubt, however, one will also detect the falsehood of H 0 when ^ <r J , for at least one pair 
of values of r and t, r ?£ t, less often in the long run by the use of these approximate signifi¬ 
cance levels than if the true levels were used, but we have no definite evidence at present 
on this point A somewhat disquieting feature is that the agreement between the x 2 values 
yielded by the two criteria becomes worse as one prooeeds toward larger values of x* in 
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terms of either quantity Thus, of 8 samples which would have rejected at the 5% level 
in Case I, only 4 of these would have been rejected by Q i , and Q 2 would have passed 3 
samples of the 7 rejected by Q i. Thus it appears that, if one wishes to work with a given 
chance of rejecting Ho falsely, one i muld choose one of these criteria and then stick to it in 
future applications For large values of the n r the two criteria tend to equivalence, so the 
choice between them is of interest mainly for small n T , but cannot bo nmdo with full in¬ 
formation until more is known about the bias, if any, of the Brandt-StevonS test, and the 
relative power of the two tests with regal'd to alternatives to IIo 


On a Test Whether Two Samples are from the Same Population. A. Wald 
and J. Wolfowitz, Columbia University and Brooklyn, Now York. 

Let X and Y be two independent random variables about whose distributions nothing is 
known except that they are continuous. Let xi , x 2 , > • ■ , x m be a sot of m independent 
observations on X and let ih , yi, • , y„ be a set of n independent observations on Y. 
The null hypothesis to be tested is that the distributions of X and Y are identical 

Let the set of m + n observations be arranged in order of magnitude, thus. Zi, z 2 , ■ ■ ■ , 
z m+ „. Replace by v, (i = 1, 2, , m + n ) where w, = 0 if z, is a member of the set of 

z’sandt), = 1, if z; is a member of the set of y’a. Since the null hypothesis states only that 
the distributions of X and Y are identical without specifying them in any other way, the 
distribution of the statistic V used for testing the null hypothesis must be independent of 
tins common distribution of X and Y It can easily be shown that the statistic U must be 
a function only of the sequence a,, Vo, ■ ■ , v,„ v „ . 

A subsequence v, , r, +l , ■ , v, +r (where r may also be 0) is called a run if v» - a J+1 = 

" = w >+r an< i d *<-1 when s < 1 and if v >( . r ^ «, +r+ i when s + r < m + n. The 
statistic U defined as the number of runs in the sequence a, , v ,, , v mhn seems a suitable 

statistic for testing the null hypothesis, A difference in tho distribution functions of X 
and Y tends to decrease U , Hence the critical region is defined by the inequality U< 14 , 
where u 0 depends only on m, n, and the level of significance adopted. If m < n and 
P[U = c) is'the probability that U - c, then! 


P\V’=2K) 




m+ "G„ 


p\U = 2I{ - 11 = (-‘Ct-i-Ci-i + 

m ^C„ 

The mean of V is. 

2mn 
m + n 

The variance of 17 is■ 


CK-1, 2, ,m), 


(K = 2, 3, ,m + 1). 


■ 2mn(2mn — m — n) 

(m + n) s (»t + 71 - 1 ) ’ 
m 

n a P 081 *' 1 ^ constant) and m—> a, the distribution of U converges to the normal 
distribution 


The Distribution of Quadratic Forms In Non-Central Normal Random Vari- 

a es. William G Madow, Washington, D. C. (Presented to the Institute 
under a slightly difierent title) 
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Let the distribution of a sum of non-central squares of normally and independently dis¬ 
tributed random variables which have the unit variances be called the x' 2 distilbution 
It is proved that if a sot of quadratic forms have a sum which is the sum of the squares of 
their variables, then a necessary and sufficient condition that the quadratic forms be inde¬ 
pendently distributed in x' 2 distributions is that the rank of the sum of quadratic forms be 
equal to the sum of theinnks of the quadratic forms Fui thermore, the constants on wlncli 
the x' ! distributions depend may bo obtained by substituting the values about which the 
variables are taken for the variables themselves in the quadratic forms Roughly speaking 
the theorem states that if a set of quadratic forniB satisfy the conditions of the Fishor- 
Cochran theorem when the true moans vanish, then the Bet of quadratic forms will be 
independently distributed in x' 2 distributions when the true means do not vanish. 


Some Theoretical Aspects of the Use of Transformations in the Statistical 
Analysis of Replicated Experiments. W. G Cochran, Iowa State College. 

The device of transforming the data to a different seale before performing an analysis of 
variance has recently been recommended by a number of writoi s for replicated experiments 
in which the original datashow a markedly skew distubution. The use of transformations 
to obtain an approximate analysis has been supported mainly on the grounds that in the 
transformed scale the true experimental erroi variance is approximately the same on all 
plots This paper considers the relation of the method of transformations to a more exact 
analysis Discussion is confined to the y/x and sin -1 \/x transformations, which appear 
to receive the most frequent use in practice. 

To obtain an exact analysis, it is necessary to specify (i.) how the expected value on any 
plot is obtained from unknown parameters representing the treatment and block (or row 
and column) effects (n‘) how the observed values on the plots vary about the expected 
values. If the latter variation follows the PoiBson law, (a case to which the square root 
transformation has been considered appropriate), the equations of estimation by maximum 
likelihood take the form 



where * is the observed and m the expected value on any plot, c is a typical unknown para¬ 
meter, and the summation extends over all plots whose expectations involve c As the 
number of parameters is usually large (eg. 16 in a 6 x 6 Latm square), these equations are 
laborious to solve; moreover, the question of obtaining small-Bample tests of significance is 
difficult It is shown that if a particular form can be assumed for the prediction formula 
in (i), namely that y/m is a linear function of the treatment and block (or row and column) 
constants, the equations of estimation may be reduced to the simpler form 


( 2 ) 


2 4(r' — y/m) = 0, 


where r' = -( y/m -f—-=.) is a function closely related to the square root of £, Itfollows 
2 \ 

that the statistical analysis in square roots, with some slight adjustments, coincideswith 
the maximum likelihood solution, provided that the above form can be assumed for the 
prediction formula. The appropriateness of this form in practice is briefly considered and a 
"goodness of fit’’ test by x 2 is developed A numerical example is worked as an illustration 
and indicates that a good approximation is obtained by the transformation alone even 
with very small numbers per plot The corresponding theory is also discussed for the inverse 
sine transformation, which applies where the original data are percentages or fractions 
whose experimental errors are derived from the binomial distribution 



114 


ABSTRACTS OR PAPERS 


In practice the type of analysis outlined above is unlikely to supplant the Bimpla use of 
transformations, because it can seldom be assumed that the experimental variance ie 
entirely of the Poisson or binomial type. The more exact analysis may, however, be 
useful (i) for cases in which the plot yields are very small integers or the ratios of very 
small integers (it) in showing how to give proper weight to an occasional zero plot yield. 

The Standard Errors of Geometric and Harmonic Types of Index Numbers. 
By Nilan Norris, Hunter College, 

Various statisticians have made empirical studies of the sampling errors of certain types 
of index numbers used in the United States and England. None of these writers has taken 
advantage of the tools afforded by the modern theory of estimation, including fiducial 
inference, as a means of arriving at direct and general expressions for estimating the stand¬ 
ard deviations of the sampling errors of geometric and harmonic types of index numbers. 

A known expression for the first approximation to the variance of a function, as given by 
the relation between the variance of the function and the variance of the argument, is 
valid for that general class of distributions of which the variance and a higher moment 
are finite. With the aid of this relation, there appear simple and useful forms for estimat¬ 
ing the standard errors of geometric and harmonic types of indexes. For sufficiently large 
samples, these forms are valid for all of the types of distributions of price relatives, produc¬ 
tion relatives, and similar observations ordinarily encountered, provided that there are 
satisfied the necessary conditions for drawing sound inferences on the basis of sampling 
without reference to the value of the variate. 

Necessary conditions for using tests of significance soundly in connection with index 
number problems are those of realistic and intimate acquaintance with observations, and 
careful attention to certain broad theoretical considerations which determine whether or 
not the index is suited for the purpose for which, it is used. 

A Study of R. A. Fisher’s z Distribution and the Related F Distribution. L. A. 
Aroian, Hunter College. 

The following results for the z distribution and related F distribution are investigated: 

(1) Geometric properties. 

(2) Exact values of the ssminvariants and moments of z. Exact values of the first 
four central momenta of F. 

(3) The approach to normality of both distributions as «» and ni become large in any 
manner whatever. 

(4) The Pearson types of approximating curves, the logarithmic normal approximation, 
the Gram-Charlier approximation, and the uses of these in finding any level of 
significance of z and of F. 

A Note on the Analysis of Variance with Unequal Class Frequencies. Abraham 
Wald, Columbia University. 

Let us consider p groups of variates and dendte b y m/ (j =■ 1, • ■ • , p) the number of 
elements in the j'-th group. Let z<f be the «-th element in the j-th group. jWe assume that 
xij is the sum of two variates «</ and i.e. Xu *» m + v/ where m (i ■» 1, • * • , m/; j m 
1, • , p) is normally distributed with mean p and variance <r’, and p/ (j m 1, ■ ■ ’ , v) * 8 

normally distributed with mean/ and variance/’. Alt the variates m and are supposed 
to be distributed independently. The intra-olass correlation p is given by 

v'* 


p 


** + /*' 
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Confidence limits for p have been derived only in case of equal class frequencies, i.e, rik - 
Wl = .. ■ = jjtp We give here the confidence limits for p in case of unequal class frequon- 

ir't 

cies. Since p is a monotonic function of —, it is sufficient to derive confidence limits for 

(T J 

A ir' s 

Denote ~ by X 5 and the arithmetic mean of thej-th group by $<, Let 

c i (T 5 


w, 


mi 

1 *f w/X ,t 


and denote by Fi and Fi the lower and upper confidence limits respectively of F, where F 
has the analysis of variance distribution with p - 1 and N - p «■ mi -f • • ■ + m v - p 
degrees of freedom, Then the lower confidence limit X j of X ! is given by the root of the equa¬ 
tion m X 1 : 



and the upper confidence limit Xlj of X 1 is given by the root of 
(2) /(X 1 ) - Fi . 

For calculating the roots of (1) and (2), we can make use of the fact $at/(X J ) is mono- 
tomcally decreasing with increasingX 1 , 


An Approach to Problems Involving Disproportionate Frequencies. Burton 
D. Seeley, Washington, D. 0, 

Applied mechanics offers an analysis of variance solution to problems of multiple classi¬ 
fication involving disproportionate sub-class numbers, The quality of orthogonality may 
be attained in such problems by measuring the variability between classes of any one 
classification after centering the others, This approach, which is not limited by the num¬ 
ber of classes or the number of classifications, treats the problem involving equal sub-class 
numbers as a special phase of the general analysis of variance 



CONSTITUTION 

OF THE 

INSTITUTE OF MATHEMATICAL STATISTICS 

ARTICLE I 
Name and Purpose 

1. This organization shall be known as the Institute of Mathematical Statistics. 

2 . Its object shall be to promote the interests of mathematical statistics. 

ARTICLE II 
Membership 

1 . The membership of the Institute shall consist of Members, Fellows, Honorary 
Members, and Sustaining Members. 

2. Voting members of the Institute shall be (a) the Fellows, and (b) all others who 
have been members for twenty-three months prior to the date of voting. 

ARTICLE III 

Officers, Board of Directors, Committee on Membership, and Committee on 

Publications 

1. The Officers of the Institute shall be a President, two Vice-Presidents, and a Secre¬ 
tary-Treasurer, elected for a term of one year by a majority ballot at the annual meeting 
of the Institute. Voting may be in person or by mail. 

(a) Exception. The first group of Officers shall be elected by a majority vote of the 
individuals present at the organization meeting, and shall serve until December 31,1936 
, 2. The Board of Directors of the Institute shall consist of the Officers and the previous 
President. 

3, The Institute shall have a Committee on Membership composed of three Fellows. 
At their first meeting subsequent to the adoption of this Constitution, the Board of 
Directors shall elect three members as Fellows to serve as the Committee on Membership, 
one member of the Committee for a term of one year, another for a term of two years, 
and another for a term of three years. Thereafter the Board of Directors shall elect 
from among the Fellows one member annually at their first meeting after their election 
for a term of three years The president shall designate one of the Vice-Presidents as 
Chairman of this Committee. 

4. The Institute shall have a Committee on Publications composed of three Members 
or Fellows elected by the Board of Directors, The President shall designate a Vice- 
President as Ex Officio Chairman of this Committee. 

ARTICLE IV 

' Meetings 

1. A meeting for the presentation and discussion of papers, for the election of Officers, 
and for the transaction of other business of the Institute shall be held annually at Buch 
tune as the Board of Directors may designate. Additional meetings may be called from 
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time to time by the Board of Directors and shall be called at any time by the President 
upon written request from ten Fellows. Notice of the time and place of meeting shall be 
given to the membership by the Secretary-Treasurer at least thirty days prior to the 
date set for the meeting All meetings except executive sessions shall be open to the 
public. Only papers accepted by a Program Committee appointed by the President may 
be presented to the Institute. 

2. The Board of Directors shall hold a meeting immediately after their election and 
again immediately before the expiration of their term Other meetings of the Board 
may be held from time to time at the call of the President or any two members of the 
Board. Notice of each meeting of the Board, other than the two regular meetings, 
together with a statement of the business to be brought before the meeting, must be 
given to the members of the Board by the Secretary-Treasurer at least five days prior to 
the date set therefor. Should other business be passed upon, any member of the Board 
shall have the right to reopen the question at the next meeting. 

3. The Committee on Membership shall hold a meeting immediately after the annual 
meeting of the Institute. Further meetings of the Committee may be held from time to 
time at the call of the Chairman or any member of the Committee provided notice of such 
call and the purpose of the meeting is given to the members of the Committee by the 
Secretary-Treasurer at least five days before the date set therefor. Should other business 
be passed upon, any member of the Committee shall have the right to reopen the ques¬ 
tion at the next meeting. 

4 At a regularly convened meeting of the Board of Directors, three members shall 
constitute a quorum. At a regularly convened meeting of the Committee on Member¬ 
ship, two members shall constitute a quorum. 

ARTICLE V 

Publications 

1. The Annals of Mathemalical_Slalistics shall be the Official Journal for the Institute- 
Other publications may be originated by the Board of Directors as occasion arises. 

ARTICLE VI 
Expulsion on Suspension 

1. Except for non-payment of dues, no one shall be expelled or suspended except by 
action of the Board of Directors with not more than one negative vote. 

ARTICLE VII 

Amendments 

1 This constitution may be amended by an affirmative two-thirds vote at any regu¬ 
larly convened meeting of the Institute provided notice of such proposed amendment 
shall have been sent to each voting member by the Secretary-Treasurer at least thirty 
days before the date of the meeting at which the proposal is to be acted upon. Voting 
may be in person or by mail. 

BY-LAWS 

ARTICLE I 

Duties op the Officers, Board of Directors, Committee on Membership, and 

Committee on Publications 

1. The President, or in his absence, one of the Vice-Presidents, or in the absence of the 
President and both Vice-Presidents, a Fellow selected by vote of the Fellows present, 
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shall preside at the meetings of the Institute and of the Board of Directors. At meetings 
of the Institute, the presiding officer shall vote only in the ease of a tie, but at meetings 
of the Board of Directors he may vote in all eases. At least three months before the date 
of the annual meeting, the President shall appoint a Nominating Committee of three 
members. It shall be the duty of the Nominating Committee to make nominations for 
Officers to be elected at the annual meeting and the Secretary-Treasurer shall notify all 
voting members at least thirty days before the annual meeting Additional nomina¬ 
tions may be submitted in writing, if signed by at least ten Fellows of the Institute, up to 
the time of the meeting. 

2. The Secretaiy-Treasurer shall keep a full and accurate record of the proceedings 
at the meetings of the Institute and of the Board of Directors, send out calls for said 
meetings and, with the approval of the President and the Board, carry on the corre¬ 
spondence of the Institute. Subject to the direction of the Board, he shall have charge 
of the archives and other tangible and intangible property of the Institute. He shall 
send out calls for annual dues and acknowledge receipt of same; pay all bills approved 
by the President for expenditures authorized by the Board or the Institute; keep a 
detailed account of all receipts and expenditures, prepare a financial statement at the 
end of each year and present an abstract of the same at the annual meeting of the Insti¬ 
tute after it has been audited by a Member or Fellow of the Institute appointed by the 
President as Auditor. The Auditor shall report to the President. 

3. The Board of Directors shall have charge of the funds and of the affairs of the 
Institute, with the exception of those affairs specifically assigned to the President or to 
the Committee on Membership. The Board shall have authority to fill all vacancies 
ad interim, occurring among the Officers, Board of Directors, or in any of the Committees. 
The Board may appoint such other committees as may be required from time to time 
to carry on the affairs of the Institute. 

4. The Committee on Membership shall prepare and make available through the 
Secretary-Treasurer an announcement indicating the qualifications requisite for the 
different grades of membership. 

5. The Committee on Publications, under the general supervision of the Board of 
Directors, shall have charge of all matters connected with the publications of the Insti¬ 
tute, and of all books, pamphlets, manuscripts and other literary or scientific material 
collected by the Institute. Once a year this Committee shall cause to be printed in the 
Official Journal the Constitution and By-Laws and a classified list of all the Members 
and Fellows of the Institute. 


ARTICLE II 
Dubs 

1 Members shall pay five dollars at the time of admission to membership and shall 
receive the full current volume of the Official Journal. Thereafter, Members shall pay 
five dollars annual dues. The annual dues of Fellows shall be five dollars. The annual 
dues of Sustaining Members shall be fifty dollars. Honorary Members shall be exempt 
from all dues, 

2. Annual dues shall be payable on the first day of January of each year. 

3. The annual dues of a Fellow or Member include a subscription to the Official 
Journal. The annual dues of a Sustaining Member include two subscriptions to the 
Official Journal, 

4. It shall be the duty of the Secretary-Treasurer to notify by mail anyone whose due# 
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may be six months in arrears, and to accompany such notice by a copy of this Article 
If such person fail to pay such dues within three months from the date of mailing such 
notice, the Secretary-Treasurer shall report the delinquent one to the Hoard of Directors, 
by whom the person's name may be stricken from the rolls and all privileges of member¬ 
ship withdrawn. Such person may, however, be re-mstated by the Board of Directors 
upon payment of the arrears of dues. 

ARTICLE III 
Salabies 

1. The Institute shall not pay a salary to any Officer, Director, or member of any 
committee. 

ARTICLE IV 

Amendments 

1. These By-Laws may be amended in the same manner as the Constitution or by a 
majority vote at any regularly convened meeting of the Institute, if the proposed amend¬ 
ment has been previously approved by the Board of Directors. 
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2. The Laplace-Liapounoff theorem. 4 Wo shall first state some definitions 
and terminology which will be used throughout the paper. 

If used as subscripts or superscripts, or as indices of summation or multiplica¬ 
tion, the letters i, j will take on all integral values from 1 through p, the letters 
fi, v will take on all integral values from 1 through n, the letters y, 5 will take on 
all integral values from 1 through m, the letter a will take on all integral values 
from 1 through k, and the letter /S will take on all integral values from 1 through 
fc-1, unless explicit statement to the contrary is made. 

The totality of all sets of v real numbers will be denoted by R\ Thus R" is 
the combinatory product of the spaces R l , R l , • • • , R 1 , (v times). 

If %i, ■ ■ ■ , x n are random variables, and if A is a proposition concerning 
*i, ,x n , then by P{A} we shall moan “the probability that A.” The 
distribution function of the random variables an, ■ ■ • ,x n will be denoted by 
F{x i, ■ ■ • , *»), i-e, 

F(x°t ,-••,*“) = P{xi < x\ , • • > , x n < x ° n } 

for all sets of n real numbers. Thus F will have an operational meaning in 
this paper. 

If A (an , ■ • , x n ) is a function of Xi, • • • , x n defined on R n and measurable 6 
with respect to F{x i, ■ ■ • , x „), then E{A(Xi , •. , x n ) j will bo defined by the 
equation, 

EfAfo, , a;„)) = / A(*i, • • < , x n ) dF(an, •.. , x n ), 

Jr* 

where the integral is a Lebesgue-Stieltjes or Radon integral. Hence 
| A(an , • ■ , x n ) | is assumed to be integrable with respect to F(x L , • ■ • , x„). 

If U(yi, ■ ■ ■ , y P ) is a single valued measurable function of y \, • ■ • , y p on 
R r , and if y x is a real single valued Borel measurable 8 function of an, • • • , x n 
on R n , then upon substituting for j/ t , • ■ , y v it is seen that 0 (y x , ■ • ■ , y v ) 


1 Although the theorems will be stated in terms of probability distributions, Borel 
measurability, and Lebesgue-Stieltjes integrability, it may simplify the reading if the 
words ‘‘probability distributions" are replaced by probability densities or statistical 
distributions, ‘‘Borel measurability" are replaced by continuity, and “Lebesgue-Stieltjes 
integrability 1 ' are replaced by Riemann integrability. 

1 A function A(x,, ... , i,) defined on R" is said to be measurable with rospoct to a distri¬ 
bution function F(xi , ... , x„) if the set 15(f) of all Xi, . . , x„ such that A(x x , .. , Xn) < l 

is such that f dF(x t , ,., , x„) is defined for all f. 

Jji(t) 

• All subsets of R n which may be formed from the totality of intervals of R" by repeated 
summations or multiplications of not more than a denumerable number of intervals of 
R", and R" itself, constitute the totality of Borel sets of ft 1 *. The function y{x i, .. , 3»), 
defined on R n , is a Borel measurable function of xi, ... , x„ on R n if the set of values of 
Xl i ■' i l ; su °h that y(x,, . , x„) < t is a Borel set for all f. The class of continuous 

functions is contained in the class of Borel measurable functions For further details, 
see [3, ohs, 1, 2], [11, ch 3] and [17, chs. 1, 2, 3], 
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is a single-valued measurable function, A( 2 : 1 , • , x„) of , ■ , x„ on R n . 

If ®i, ■ ■ • 1 are random variables, then y \, ■ ■. , y p are random variables, 
and 7 

(2.1) E[Q(yi,-- ,y p )} = ElHxi, ... ,x n )\. 

We shall call j E(x,) the mean value of x,, <r tj the covariance of x, and x ,■, 
and au or a\ the variance of x,, where o-,, = E{(x l - Ex,)(x, — Ex,)}. 

The Laplace-Liapounoff, or Central Limit theorem states conditions under 
which linear functions of random variables have a normal limiting distribution. 
The general characteristic of the proofs of the theorem is that conditions are 
placed on the random variables so that they may virtually be assumed to be 
bounded. The Lindeberg 8 condition, which we shall use, is perhaps the least 
restrictive of all the conditions which require finite means and variances. 

The Lindeberg condition 9 , S. P : A set of random variables x,>„ will be said to 
satisfy the Lindeberg condition if there exists, for any preassigned positive 
real numbers 5 and t, a positive integer n 0 such that if n > no , then 

^ I Zyn dF{Xto, ‘ * * j Xpvfl) 

where 

Zyn ^ Xlvn ~b %2yn d” ' ' d“ %pvn 

and 

<r?i7i + Viin O'lnn = 1* 

If 

x„ n = — where s*„ = <r« + • • • d* <r<n, 

and the x,,„ satisfy £„ then we shall say that the x w satisfy £ P . 

Suppose that the random variables yn, • ■ • , y pmp have a normal multivariate 
distribution with zero means and with covariance parameters <r, T ,j where 

< 7 iy ,s = E { y tl y , i ), y = 1, • • • , m , ; 5 = 1, ■ ■ ■ , m*, 

and denote the distribution function of y n , • • • , y pmp by N{y). Then we may 
state the Laplace-Liapounoff theorem as: 

i It is noted that n(j/i, ■ ■ , Vr) is integrated with respect to y P ) and 

A(xi, ... , x n ) is integrated with respect to F(x, , .. , x„) 

8 See Cramer [3, pp. 57, 60, 114], and the references there given. 

* It is not difficult to show that the Lindeberg condition will be satisfied if moments of 
order greater than two exist, [3, p. 60], or if the conditions stated by Levy [13, p, 207] 
and [14, p. 106] are satisfied 
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Theorem I. Suppose that, for each value of n, the random variables x iyt „ , 
which are independent for different values of v, have zero means and covariance 
parameters a iy ,i m , where 


0*t yjlvn — S(X{yi 


Denote by d' n the maximum of the variances i n yiyvn . If the functions y {yn are 
defined by the equations 

V%yn = •dyrt i > 

V 


it follows that 

Gifj&n = E(tysyn ~ (fiyjSvn • 

v 

If lim — <r nt t and if lim d'„ = 0, then a necessary and sufficient condition 

n—*oo n-**«5 

that as n —> «, the limiting distribution 10 of i/im, • • ■ , y P m p n be N(y) is that the 
condition f£ pmp be satisfied. 

The proof of this theorem is omitted. It may readily be developed from the 
proofs of Cramer, [3, pp. 57, 113] 

Before stating certain corollaries which are of interest, some additional 
definitions are necessary. 

Let C n , Gn+i , ■ • ■ be a sequence of m rowed real matrices 

C n = || c yvn ||, n = m, m + 1, ■. ■ , 

and let the greatest of the absolute values of the elements of C n be denoted by 
d n . The inner product of any two rows of C n will be denoted by p 7 s„, i.e. 

Pyln = iL* CyrnCtiM' 


Let Xi, Xj, • • be a sequence of random vectors of p components defined 
on R v , and let the components of X„ be denoted by x lp , ■ • , x pfl . Let the 
components of the chance matrix T n = 11 y tyn 11 which has p rows and m columns, 
be defined by the equations 

( 2 . 2 ) V\yn — jDj C y , 

V 

for each value of n, (n = m, ■ ■ ; m > p). 


10 The distribution functions F(X„) will be said to converge to the distribution function 
F(X) if and only if 


lim T dF(X n ) = F(X) 

for every X at which F{X) is continuous If F(X) is continuous throughout R", then the 
convergence is uniform. 
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Suppose that 

(2.3) E(x„) = 0 

and 


(2.4) 


E(x,„Xj^) — (T,j S pr , 


where B p , = 1 if y, = v and = 0 if p t* v. (There should be no confusion of 
this use of the letter S with its use as an index.) It is easy to see that if the 
Cy n are real numbers, then 

E(y tyn ) = 0 


and 


E(y tynUitn) °"»jP 7 5n • 


Let the determinant of the positive definite symmetric matrix, (tr) = || <r„ || 
be denoted by a. Let the inverse matrix of ( a ) be denoted by (<r) _1 = || a' 3 1| 
where a 3 is the eofactor of tr<,- in (<r) divided by a. The determinant of (<r) -1 

. -l 

IS a 

By Na(x i, ■ ■ ■ ,x P ; (<r )) we shall mean the normal probability density with 
zero means and covariance parameters «r„, i.e., 

Nd(x i, •••,£,,; (<r)) = (2TTff) _i exp [—| 2 a x, XiX^\, (- oo < x { < oo), 


where (<r) is a positive definite matrix. If the random variables Xi , • ■ • , x p 
have probability density N,i(X ; (<r)) = Nd(xi , ■ • •, x v ; (a-)), where X is a vector, 
then we shall say that X has a distribution function N(X- } (a)), i.e. 


0 P 


dXi • • • dx p 


N(X’ (a)) = Nd(X; (<r)) 


or 


/ Xp P®1 

• • ■ / Nd(h, , 

CO j—00 


ip ; (<r)) dti • ■ ■ dtp = N{X; (tr)). 


Inasmuch as certain hypotheses will be used on several occasions in this 
paper, they are stated here 

If , Xt, • • • are independently distributed, if (2.3) and (2.4) hold and if 
the x’s satisfy the condition £„ then we shall say that UC P is true. 

If C n is such that, for all n, the equations p y m - 5 y s are true, we shall say 
that G is true. 

The following corollary is useful in deriving limiting distributions in the 
analysis of variance. 

Corrollary I. Let TfC p .and G be true. Then a sufficient condition that 

lim F(Y n ) = II N(yi y , • • •, y py ; (<r)) 

is lim d n = 0. 

n—►« 
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The proof is based on the fact that the x xytn of Theorem I are given by c. )vn x iv . 
The details are omitted. 

The pm rowed square matrix, (r) = ||r„, || is defined as follows: If r < tn, 
s < m; then t„ = ; and if km < r < (& + 1 )m, Im < s < (l + 1 )m, 

l, k = 0, • • • , p — 1, then t„ = <Tk+i i+ip r -km .-in . The inverse matrix of 
(t), and the determinants of (r) and (r) -1 are defined as are (<r) _1 , a and a~\ 
Coeollaky II. Let 3C ? be true, and let 

lim Pyln — pyl, pyy — 1. 
n-*« 

Then, if lim d n - 0, it follows that 

ft—+oo 

lim F(Y n ) = F(Y), 

n-*oo 

where F(Y ) is the distribution function determined by the probability density 

pm r pm 

(2t) 2 T exp I n T yk-\-l r—km 2/1+1 a —lm 

L 2 n«-l 

where , if r < m, s < m, then k = 0, Z = 0; if r < m, m < s < 2m, then k = 0, 
l = 1; and so on. 

The proof is omitted. 

If Zi , ■ • • , Zi are random variables, then F(X i, • • , X k \ Z\, ■ • • , Z t ) is 
the distribution function of the random vectors Xi, ■ ■ ,Xk for fixed values of 
Zi, • • , Zi , i.e. for any fixed values of Zi, • • • , Z t , 

P{Xi <X x> ... ,X k < X k \ = F(Xx, ■ ,X k \Zi, ... , Zt). 

We shall now assume that the elements c yrn of the matrix C n are Borel measur¬ 
able functions of a set of random variables 11 Z \, • , Z tn . Then the matrix 

C„ may be called a random matrix defined on a space W n which is the combina¬ 
tory product of the spaces on which Zi , ■ , Z ln are defined. If, for each value 

of n, and for all X n and Z n , the equation 

(2.5) F(X n , Z n ) = F(Z n ). II F(X V \ Z n ) 

is satisfied, then we shall say that $ is true. It is obvious that sufficient condi¬ 
tions for the truth of S are 

F(X n ,Z n ) = F{Z n ).\[F{X f ) 

V 

or, if t n > n 

F(X n , Z n ) = F(Z n +iZO . II F{X„ Z t ) 

u The Bymbol X n will stand for the set of variables X,, 
will stand for the set of variables Zi, . , , Z> n . 


. , X n i and the symbol Z 



LIMITING DISTRIBUTIONS 


131 


or, if t n < n 

F(X n ,Z n ) = t[F(X„Z f ). ft F{X,). 

Inasmuch as we shall often use Fubini’s theorem, it is now stated here. 12 
Theorem II. Let the distribution function of X n , Z n be F(X n , Z n ), let the 
distribution function of X n for fixed values of Z n be F(X n | Z”), and let the distribu¬ 
tion function of Z n be F(Z n ), Then if A(X n , Z’“) is measurable with respect to 
F(X n , Z") and if 

[ | A(X n , Z n ) | dF(X n , Z") < oo, 

Jnp»x 

it follows that 

[ | A(X n , Z n ) | dF(X n | Z n ) < <x> 

J KPn 

for almost all n sets of values of Z n and 

[ A (X n ,Z n )dF(X n , Z") = [ \f MX 11 , Z") dF(X n \Z' l )]dF(Z n ). 

In Corollary I an important condition was that the maximum of the absolute 
values of the elements of C n should approach zero as n increased. In order to 
obtain a similar condition when the elements of C n are random variables, we 
shall define the function d(C n ) as follows: For each value of Z" let d(C n ) be the 
maximum of the absolute values of the elements of C n . We shall denote 
d(C n ) by d n . If the elements of C n are Borel measurable functions then d n is a 
Borel measurable function of Z". Hence d n is a random variable defined on W n • 
A sequence of random variables di, di , • • ■ is said to converge in probability 
to zero if, given e > 0, then 

lim F{| d n | > e) =0. 


If the sequence of functions d p , d p+ i, converges in probability to zero we 
shall say that Z is true. 

If if is true, and if, for almost all values of Z" we have 


(2.0) 

[ x„dF(X,, Z 71 ) = 0, 


*/2P 

(2.7) 

f XtrX^dFiXy, Z n ) = <r,7 

Jrp 


15 Proofs of Fubim's theorem with the required amount of generality will be found in 
[5, p. 101] and [14, p. 73], 

15 A proposition concerning random variables is said to be true for almost all values of 
the variables, if it is true for all values of the variables, except perhaps for a set of proba¬ 
bility zero with respect to the distribution function of the random variables 
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and the condition is satisfied with respect to the X and the distribution func¬ 
tions F(X *, Z n ) then we shall say that X - !, is true. 

If 

(2.8) ^ l I CyynCiynXwXjy dF(X v , Z ) (Tij t 

y J R?XW n 

then we shall say that C° is true. It is noted that if 3 and (2.7) are true, then 
t’° is true if C is true for almost all sets of fixed values of Z n . 

Corollary III. Let <?°, 3 and X 0 ,, be true. Then, if 3 is true, it follows that 

lim F(Y n ) = II N(yi y , • • •, y P y] GO)- 

n-* oo y 

Proof. It is necessary to show that the condition X. pm is satisfied by the 
variables Cy, n x<, if the condition St p is satisfied by the variables and that the 
condition 3 implies that lim d n = 0 when the x, yrn of Theorem I are sot equal 

n~*oo 

to the Cy Vn Xiv of Corollary III. 

If we let = E {c y , n Xi V f, A\ = E A 2 „ and let si = E [A 2 „), then, by (2 8), 

y,i * 

it is true that 

s!, = Eir» = inE a,,. 

y.i * 

From X), and the fact that for sufficiently large n, | dl(Z") | < 1 for almost all 
Z" we have for any preassigned « and S, 


4 [ A 2 dF(X n , Z n ) < 4 E f mdl(Z n ) E dF(X v , Z") < S 


for sufficiently large n, since the set of x’b and Z n for which E > (S n con- 

t,y 

tains almost all the z’s and Z n for which A„ > es„ . Hence, the condition 
£ pm is satisfied by the random variables Cy,„x„ with respect to the distribution 
functions F(X,, Z n ). 

We now show that 

hm [max E\(c ym Xi r ) 2 }] = 0 . 

n-*oo 


It is clearly true that 

E{{cy tn x„y) < [ dlxldF(X v ,Z n ). 


Since d n converges in probability to zero, and since d 2 < 1 for almost all Z, 
we can, for any « > 0, take n 0 so large that if n > no , then P{d\ > | e} < h- 
If E i8 the set on which dl > |e, we then have for all n > no , using (2.7), 

E{(c 7 „,a;,„) 2 ) < !.[L xldF{X y \Z n )^dF{Z n ) 

+ $ f w [ f at xldF{X y \Z n )jdF{Z n ) < «<r„ 

and this inequality is also satisfied for all n > n«. 
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The following discussion is useful in obtaining the limiting distributions of 
statistics winch occur in multivariate statistical analysis. 

The letter / will assume all integral values from 1 through s, the letters n, v 
will assume all integral values from 1 through nj , and the letters y, S will assume 
all integral values from 1 through m f , for any /. 

Let X{ , ■ be, for any fixed/, a sequence of random vectors of p/ compo¬ 
nents defined on R Pr , and let the set of random variables X{ , ■ ■ ■ be independently 
distributed for any fixed /. 

If, for each set of values of ni, • • • , n ,, (l n is a function of ni, • • , n»), 


F{x\ X' n , ZJ = n n mi \Z lt • • •, Z tn ).F(Zi, ZJ, 


we shall say that /,» is true. 

Let, for any fixed value, of /, the matrix 14 C{ = || Cy, n || where the t/ y , n are Borel 
measurable functions of X £ , (k < /), and 15 Z n , have the same properties as 
C n i and let d(C'i) be the same function of Ci that d(C n ) is of C„ . We shall 
denote d(Ci) by d ; „ . 

Let 



V 


X 


f 

IV 


and let Y f n = || yi <in ||. 

For fixed/, the p/ rowed square matrix ( 07 ), its inverse, and so on are defined 
as were the same functions of the tr,, earlier in this paragraph but with a tjf 
replacing <r„ , where 

E[x[,} = 0 


and 


E [XiiiXjti] — . 


If S m is true, and if for almost all values of Z T we have 

(2.9) f x{ r dF(Xi, Z n ) = 0, 

Jrv; 

(2.10) f x f i„x f „dF(Xi,Z n ) = 

J n p/ 

and the condition S p/ is satisfied with respect to the X{ and the distribution 
functions F(X{ , Z") then we shall say that 3t v p/ is true. 

If 

(2.11) 22 f c f y m Ct m xi,x'„dF(X / ,, Z n ) = <r„/5 7 j, 

y J 


14 The superscripts / and k will nQt indicate multiplication but will only be indices. 
16 See footnote 11. 
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then we shall say that & is true. It is noted that if A; and (2.10) are true 
then is true if (? is true for almost all sets of fixed values of Xj , .. , Xif 1 , 
Z n . 

If dn converges in probability to zero as n increases we shall say that 2/ is 
true. 

Coeollaey IV. Let 6", A, and , ■ - • , UC‘ Pt be true. Then, ifZ i, 
are true, it follows that 

iim p(7i I ,...,n.)-nj' , a r/ ), 

where 

F(Y S ) = 

y 

The proof is almost identical with the proof of Corollary III of which this 
corollary is an extension. 

It is remarked that if the statistics, the limiting distributions of which are 
desired, are associated with the normal distribution, as are most statistics 
studied, then Corollary IV may not be the best tool to use. This is a conse¬ 
quence of the fact that such statistics are generally expressible as functions of 
uncorrelated random variables and hence are more simply discussed, using 
Corollary I. 

3. Limiting distributions of quadratic and bilinear forms. We first assume 
the coefficients of the forms to bo constants. For each set of values of i, j, and 
n, the matrix of the bilinear form with coefficients which are real numbers, 

(3-1) hi, ~ ) 1 a pvn x, p Xj V j 

will be denoted by A „ , and the rank of A „ will be denoted by to. The maximum 
of the absolute values of the elements of A n will be denoted by b n . We shall 
assume that there exists an orthogonal transformation, 

(3-2) Vi^n = ) 1 x t v , 

V 

of x,i , • ■ , x tn such that 

( 3 - 3 ) 2 hsy,i n yjin, 

5 

where the coefficients \ s are non-negative, 16 
Lemma I. If d n is ike maximum of the absolute values of the elements c pm 
then a necessary and sufficient condition that lim b n = 0 is lim d„ = 0. 


11 Our theorems will not be applicable if some of the Aj are negative and some are positive. 
However if all the Aj are non-positive then the theorems will remain true. 
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Proof From (3.1) it follows that 

dfivn Xj Clpn Ofipn • 

i 

Hence, b n "> > X^cj,,,, and | a^„ | < d\ (S X5). The remainder of the proof 

is obvious. 

The following theorem will be the basis for a large sample analogue of Wis- 
hart’s distribution. 

Theorem III. LefDipbe true. Then, a sufficient condition that 
lim F(F n ) = II N(yi y , ■. • , y„ ; (a)), 

7l—*cQ 7 

where 6 ", = 2 Xst/tjni/jjn is lim b n = 0 . 

5 n-*cc 

Proof. According to Lemma I, the fact that lim b n = 0, implies that 

71-too 

lim d n = 0. The y xv „ are such that £ is true. Hence the hypotheses of Corol- 

71 —> OO 

lary I are satisfied and the theorem is proved 
Before stating the corollary to Theorem III, we shall prove an obvious lemma 
which is of constant service. 

Lemma II. Let lim F(X n ) = F(X) at all points of continuity of F(X), and let 

n-*i o 


Qln {7l0tln j ' j X pn ), ' ' * j gin “ Ok(%ln j ‘ ■ , Xpn) 

be Borel measurable functions of their indicated variables for each value of n, 
(p ^ k), defined on R v . 

Then 

lim F(gir ,, • • •, g kn ) = F{g x , • • • , g k ) 


at all points of continuity of F(g i, • ■ • , g k ), where g a = g a {x i, ■ • , x v ). 

Proof. By (2.1), we have 

(3.4) _g[ e >2<afi a G 1 „ ,, ",i prl )] _ 

where since g a (x x , • • ■ , m p ) is a Borel measurable function of x x , • • , x p we 
know that gfi Fl , • ■ • , g kn have a joint distribution function F(g x „ , ■ ■ ■ , g kn ). 
Then, since lim F(X„) = F(X) at all points of continuity of F(X ) we have 17 

n—*« 

lim = E[e'%‘ aga< ' Xi ' ” ,Ip) ) 


uniformly in every ti, • • • , t v interval since 

| _ ^[ e *EG0o(*i.'' ,*p)j | 

^ JldF.GCi, ...,x p ) - F(Z!, ...,x p )|, 


17 See Cramer, [3, p. 30] and “Additional Note” at the end of the book 
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where F n (X i, ■ , X p ) stands for F(X ln , ■ ■■ , X pn ), when X L and X tn have 

the same numerical values. If follows from (3.4), that 

lim E[e^‘ atan ] = E[e'l taB °} 

n-*oo 

uniformly in every k , • ■ • ,t p interval, and consequently 
lim F(g ln , ■•• , g kn ) = F(g i , ■ • ■, g k ) 

n -»«j 

at all points of continuity of F(g x , ■ • , g k ). 

The real valued function Gd (x; n, c ) will be defined by the equations 

(?a(0; 0, c) = 1 , (— co < c < oo), 

Oi(x; n, c) = lr(*n)r } (2c) _,n x ln '' 1 exp £ " > (0 < x < °°; c > 0; n > 0), 

and Gj(x; n, c) = 0 otherwise. The function G(x; n, c) will be defined by the 
equation 

G(x ; n, c) — I G d (l; n, c) dt. 

•'o 

The real valued function Gd{xu , X 12 , • • • , x vt ; n, (tr)) will be defined by the 
equations 

G d (0, • ■ •, 0 ;p - 1, (<r)) — 1 

G,(xu, ■■■, x PP ; n) (<r)) = a ~ in • [II ri(»-»+l)]-*.| x 

\ 

• exp | £ ir t; xj, (0 < x„ < co j xl : < x u x „); (tr) is positive definite, 

where | x | is the determinant | a;,-, | and Gd{x lx , • • , x pp ; n, (tr)) = 0 otherwise. 
The function G(x n , • • • , x PT ; n, (tr)) will be defined by the equation 

r x pp /• 1 

G(x X i , ■ • ■ , x pp ; n, (a)) = • •. I Gd(ki, • • • , t pv ; n, (tr)) dkidtu • ■ ■ dtp ,,. 

J—60 J—oO 

We can now state the limiting distribution analogue of Wishart’s distribution. 
Cobollary V. If 3C„ is true, if Aj = 1, and if m > p then 

lim F(b ii, bn, ■ • ■, &"„) = G(bu , • • • ,b vp ,m, (tr)). 

n —*ao 

Proor. The conditions of Theorem III and Lemma II are satisfied. 
Obviously for fixed i, the limiting distribution of h"< is G(b; m, tr„), and if 
i f 6 j, the limiting distribution of K, /m is the distribution of the covariance of 
x, and x, in a sample of m independent pairs of observations. 18 


18 See Wishart and Bartlett, [1, p. 266], 
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We proceed to the analogue for limiting distributions of one of our generaliza¬ 
tions of the Fisher-Cochran theorem. It is first desirable to give some addi¬ 
tional definitions. 

We consider the bilinear forms 


(3.5) 


in w \ ' a 

V\ja / yn Xip Xjp 


with real coefficients, and we denote the matrix of b* ja by A* . The rank of 
An is mp , and the rank of A k n is wu„ . If the maximum of the absolute values 
of the elements of A l n , • ■ , A^ 1 is bn , and if there exists an orthogonal trans¬ 
formation, 

(3.6) 2/»/in = Xj C)ivn%it) 


of xu , ■ ■ , x,„ such that 

btja ^ ) Xj2/i5nl/,5n , 

S 

where 5 assumes all integral values from mi -f .. + m a -\ + 1 through 
nh + • • • 4- m a and is non-negative, then it is easy to piove, as in Lemma I, 
that a necessary and sufficient condition that lim b n = 0 is lim d n = 0, where 

n— n—»oo 

dn is the maximum of the absolute values of the elements c>„. 

Lemma III. Let m = mi + ■ • • + and let 

(3.7) 

a v 

Then, a necessary and sufficient condition that 

biia = yanViin , 

i 

where the real linear functions, y,s„ , of x,i , • • , x,„ are given by (3.6), the linear 
functions (3.6) not now being assumed to be orthogonal, is 


mkn — n — m, 

Furthermore, the functions (3.6) are orthogonal. 

The proof of this lemma for the case p = 1 is given in [16] The procedure 
to follow in extending the lemma to the cases where p > 1 , is given m [15, p. 
473]. It is noted that this lemma is more general than the lemma in [15] 
inasmuch we we show that the orthogonality of the transformation is a conse¬ 
quence of our hypotheses and not one of the hypotheses 19 


15 It is noted, however, that the increase in generality affects only the necessity not 
the sufficiency of the theorem. 
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Theorem IV. Lei IX,, , (3.7) and (3.8) be true for all values of n, and suppose 
that 11m b„ = 0. Then 

fl —►60 

lim F(y n ) = II N(yi-t , ■ ■ • , y vy ; (tr)), 

n -♦«> 7 


whCTG Z?tja 2/tin 2/;iu » 

The proof is omitted 

Corollaby VI. i/ the hypotheses of Theorem IV are assumed , and if mp>p\ 
(/? = 1, ■ • - , fc; h < k) } then 

lilTl Fftm , • • • , ipphf l/lA+Ii» > * * ' ) 2/jJwtn) 

ft—►oo 

A ftl 

= H G(buy ) ' ‘ ' 1 bppy I Vly ) (ff)) • II N(lJly I ' • ■ j 2 /p7 , {a)). 
y**l y—h+1 

If p = 1 m Theorem IV and Corollary VI, we have the large sample analogue 
of the Fisher-Cochran theorem. 

We now discuss limiting distributions of random variables which are bilinear 
and quadratic forms in one set of chance variables for fixed values of other ran¬ 
dom variables. We consider the coefficients a M „„ and a“,„ of K, and 6",« to be 
random variables. Hence the matrices A n and A" are random matrices. 

To be more explicit, let X{ , X {, ■ • • be a sequence of random vectors, the 
random vector X{ having p/ components x{„ , • , x ! V)n , and being defined on 
R p/ . The set of random vectors X{ and Zi , • ■ , Z tn will be assumed to be 
independent. 

For each value of / the coefficients of the bilinear forms 

n/ 

(3.9) bifaf ~ UlAVaf j (b j = 1; ' ' ' j P / J & ” lj * * ‘ J k/) 

H, 

will be assumed to be Borel measurable functions of the random vectors 
Xl , - ■ , and Z x , .. , Z tn 

The matrix of bif a , is denoted by . The rank of A*, is m?, and the rank 
of Afj is rrik m for all sets of values of the a^Lf except, perhaps, on a set E n , 
which is such that lim P(E nj ) = 0. 

Let the function b(A%) be defined as follows; 

For each set of values of the X[ and Z let b(Anj) be the maximum of the abso¬ 
lute values of the elements of . We shall denote biAfff] by b „ f . Obviously, 
bn, is a Borel measurable function of X[ and Z. Hence 

b% = &«) 

is a random variable defined on W X R niPl+ " + ' l * J \ 
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For each value of /, and for almost all sets of fixed values of the X'‘, (h = 
1) ■ •)/“!)j' ve shall assume that there exists an orthogonal transformation, 


(3.10) 

of xii, 

(3.11) 


Vtlin/ ~ £ Cp, 


'fiUtlf f 


, x[n, such that 20 


*>”/■</ = Z yiu/yi: 


l\n/ , 


where X assumes all integral values from m,/ + .. + m a -1 / 4- 1 through 
m m + • • • + m a f . The coefficients </ n/ of the linear forms (3.10) are real 
single valued Borel measurable functions of the coefficients a£„ a / of the bilinear 
forms (3.9) for fixed values of the X'‘ and Z n Let / n , be the same function 
of the functions a// that cf 


JJ LJ JJCU tsflVTif 

V vn/ is of the coefficients of the bilinear forms having 


constant coefficients. Furthermore, let d», be the same function of the matrix 

0 s = 

^ 7 If 


j | 

Lemma IV. 


where m = + 

A necessary and sufficient condition that h f 


+ / , that btf is of A mf 


<*/ 


converge m probability 


to zero as n increases is that d f n/ converge m probability to zero as n increases 


Proof. Since 


we have 


*/-1 


Z /‘V = Z c{„ n ,o{, 

(9-1 X 


”>/ » 


(Zb/ - 1)6^ > *Z > [cU] 2 

/3-1 


and 


k"' a ,| < {Z [/,»/-Z [Zn/!* < fiwWi/, 


where X assumes all integral values from wiy + ... + ?n 0 _i f + 1 through 
mii + • ■ ■ + m a j . The remainder of the proof is obvious. 

In proving Theorem V we shall use a generalization of Lemma III which is 
proved in [15, p. 473]. 

Theorem V. Let fX’J,, ■ ■ • be true, and suppose that 

z &?/«/ = i zz • 

a 

Then , i/ b{ f converges in probability to zero as n increases and if m,; = W/ — wu /n/ 
/or all values of n/ , it follows that 

lim F(j/iin! i ••• i Vp.m.T,,) = XX I^I Niyiy , ■ • ■, > (a ))• 

ni.’ * * ( n # “+oo / y=»l 

The proof is omitted. 


20 It is not necessary tliftt. the Xj be set equal to one as in (3,11) It, is only somewhat 
easier to state the results 
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Corollary VII. If m a j > p/, then 

lim F(bmi, ■■■, «) = II fl G0>iw, ■ ■ •, , (</)). 

nil* * , ft a —*00 / /3=-l 

The proof is omitted. 

Finally, let us assume that the vectors X{ , for fixed v are uncorrelated and 
for fixed / are independent. By that, we shall mean that E(x{,x%) = o /u 
and that for all n the set of random vectors X{ are independent for the same or 
different superscripts providing the subscripts are all different. Let us also 
assume that the coefficients of the forms (3.9) are real numbers. Thus we have 
weakened the hypotheses of Theorem V concerning the random vectors, and we 
have strengthened the hypotheses of Theorem V concernmg the forms (3.9). 
Inasmuch as we are generally concerned with the limiting distributions of 
statistics which occur m the analysis of the normal distribution, and many such 
statistics have been shown to be invariant under transformations into uncor¬ 
related random variables, 21 Theorem VI and Corollary VIII will often be 
applicable. 

Theorem VI. The statement of Theorem V is repeated. 

Corollary VIII. The statement of Corollary VII is repeated 
Another extension of these theorems may be obtained by allowing all the 
n s to be equal, i.e. ni = • = n, ~ n, and by putting conditions on the forms 
(3.9) which enable us to say that for fixed i,f, y and n, the set of random variables 
c r M „x[, are independently distiibuted Theoiem I could then Vie used to obtain 
a very general lesult. However, except for the case dealt with above, the con¬ 
dition of independence appears to be rather restrictive, and the theorem is 
omitted. 

4. Applications. We first state the strong law of large numbers and a 
lemma which is very useful in the discussion of limiting distributions. 

A sequence of random vaiiables Ah , ■ will be said to converge with prob¬ 

ability one 22 to a random variable X if 

lim P(| X n — X | < €, | X„ +1 — X j < e , ..., | X n+P - X | < e} = 1 

ft -"*00 

for every value of p > 0, uniformly in p for every positive number e. Upon 
setting p = 1, it is seen that convergence with probability one implies con¬ 
vergence in probability. 

The strong law of large numbers 23 asserts that if the independent random 
variables X, X x , • ■ all have the same distribution function, and if E(X) is 

finite, then the sequence of arithmetic means I 2 X, converges with proba- 

71 v 

bility one to E(X). 


21 The regression transformation which yields the uncorrelated variables will be found 
in [15, p 470, (3.2)1 

Ss See Doob [4, p, 163], and Freehet, (9, p. 228] 

2S See Doob [4, p, 163], and Freehet, [9, p. 259], A complete proof is given by Freehet. 
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Hence, if E(x„) — 0 and if cr,; is finite, then - x iy x, y = s(, n converges with 

Tl p 

probability one to <r„ . Since £ (x w - x in )(x, y - x, n ) = £ x iv x, y - nx m x, n 

V V 

where xu is the arithmetic mean of x tl , ... , X{ n , and since x in converges with 
probability one to zero, it follows that s,,„ = s[ , n — x ln x in converges with 
probability one to cr„ . It is, of course, assumed that the random variables 
x„ , x, v have the same joint distribution function for all values of v, and that 
the random vectors Xi , are independently distributed The process of the 
reduction of s,,„ to s lin in the limit, is an example of the possible uses of. 

Lemma Y. If cp(k , • ■ , t v ) in a continuous function of I, ■ ■ , t P) and, if the 

sequence of random variables x ul converges in probability, (with probability one ) to 

x, which may be a random variable or a constant, then the sequence of random, 

variables <p(x i„ , • , x p „) converges in probability (with probability one ) to 

ip{xi , ■ . • , x p ), where some or all of the x’s may be constants. If x i, ■ ■ ■ , x p are 
constants then <p(h , • , t p ) need only be continuous in the neighborhood of 

xi, , x p and Borel measurable. 

For a proof of part of this lemma which may be extended to yield the entire 
proof, see, Frechet, [9, p. 178], 

Using Lemma Y it is easy to see that the coefficients r„ of least squares 
equations converge with probability one to their ft values, where the /3 value 
is obtained by substituting <r,, for s„„ in the expression for r„ assuming, of 
course, independent random vectors which have the same distribution functions. 

Since problems in the analysis of variance may be interpreted as problems in 
least squares the above comments and Lemma V will generally make it possible, 
when determining limiting distributions, to consider the statistics to be func¬ 
tions of deviations from “true” mean functions rather than "sample” mean 
functions. 

We shall discuss, briefly, four applications of these results 
(a). The limiting distribution of the regression coefficient. Let r„ , the “sample” 
regression coefficient, be defined by the equation 



where x, y and x, y are deviations from arithmetic means. If the random vectors 
(xi ,, x,f) are independently distributed for fixed i , j, with the same distribution 
functions, and if E(x iy ) = E(x,f] = 0, E{x ir x, y ) = cr„ , then it follows from the 
strong law of large numbers that 2 x, y x, y /n converges to a,, with probability 


v 

one, and from the Laplace-Liapounoff theorem that ^ x, r x„/\/n has a normal 

V 

limiting distribution with mean cr„ and variance E{x, t x, y — <r„) j. Hence, by 


Lemma V, 


and variance lim E<n 


\/n (r n — —^ has a r 
e lim E!n(r„ - 

n-*w ^ \ O’li/ J 


has a normal limiting distribution with mean zero 


unless that limit does not exist 
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If the x„ are not random variables then, in order to apply Corollary 1 with 
p = 1, it is necessary that 


(4.1) 


lim 

n-»ac 




(E rivY 


= o. 


In that case, the limiting distribution of (2 is normal with zero mean 

v 

and variance a,, . If (4.1) is not satisfied then there is no assurance, unless 
the x,, are normally distributed, that the limiting distribution of (2®?*)^ 

V 

is normal. 

(b). The limiting distribution of the analysis of variance ratio The tests of 
significance which occur in the analysis of variance depend on the ratio of two 
quadratic forms, qu and qu , the denominator q% „ having rank (or degrees of 
freedom) mu increasing with n, and the numerator q in having rank m i not 
changing with n, i.e., 


Qin niu 

Vn — - , 

qum 

where q in + q 2n + qu = E an d Qu is a quadratic form of rank m 3n which 

V 

will be identically zero if n = -f m 2n . Since 24 q 2n is expressible as the 
variance of x about a least squares equation it follows from the previous dis¬ 
cussion and Lemma IV that — converges with probability one to tr 2 under the 

assumptions that the x, are independently distributed with zero means and 
variances Hence the limiting distribution of v n will depend only on the 
limiting distribution of qu and it will consequently be necessary to consider 
only the matrix of q in , in order to apply Corollary VI with p = 1. For ex¬ 
ample, 25 if there are pn independently distributed random variables x, r with 
zero means and variances a J arranged m p blocks of n random variables each, 
then 

Xiv X) n (x, n Xn) “f“ y ) Xin) , 

* i,y 

where x in is the arithmetic mean of x,i, • , x„ and x„ is the arithmetic mean 

of all the . Then 


q\n ri Xn) , 

i 

Qu ) ^ {Xi, Xif) i 


i,v 


mi = p — 1, 

mu = p(n — 1) 


51 This has been proved by Kolodziejczyk, [12, p 161] 
16 Other schemes are given in Fisher, [8] 
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and Ihe matrix of q ln may be obtained by substituting for the x in and . In 
this case it is sufficient to express q u as £ a^SyS, where S l = £ x„ , a u = 

v 

(v — 1 )/P n t an( ii f: i J> °i) = — 1/pn, to see that the condition that the 
maximum of the absolute values of the elements of the matux of q ln approaches 
zero as n increases. Hence, if the x w satisfy the condition £, the limiting 
distribution of m iv„ is G(v, p — 1, 1). 

Clearly, if only the rank of </a n increases as n increases, the rank vim of qt n 
being constant and if the maximum of the absolute values of the elements of 
the matrix of qu also approaches zero as n increases, then v„ will have a limiting 
distribution which is the analysis of variance distribution, and the limiting 

distribution of ——■— will be the correlation ratio distribution 

Qln + ?2n 

(c) . Periodogram analysis. We need only remark that the linear functions 
which are used in the analysis of the Schuster periodogram 26 meet all the require¬ 
ments of Corollary I if the x „ are independently distributed with zero means and 
constant variances and satisfy the condition X. Consequently the large sample 
theory of the Schuster periodogram is the same for non-normal as it is for 
normal distributions. 

(d) . Multivariate analysis We shall assume that the random vectors 

Xi , , (X„ has components x lr , • • , x pv ), are independently distributed, that 

(2,3) and (2.4) arc satisfied, and that the condition is satisfied. For any 
fixed n and a we shall call the determinant D n a of the forms (3.5) a generalized 
sum of squares, and the determinant V n a of the elements K, Jm* a generalized 
variance. Wc shall say that Dg and Vg have rank mg and that Dl and Vk 
have rank Uk n . If mg is constant, and if (3.7) and (3 8) are true then clearly 
the limiting distribution of Dg is the distribution of the generalized variance 
of mg vector observations 27 from a normal distribution, with zero means and 
covariance parameters <r,,' . Under the same conditions, the limiting distri¬ 
bution of Dp / Vk is the distribution of the generalized variance of nig vector 
observations from a normal distribution with zero means and covariance pa¬ 
rameters . Many other similar limiting distributions are immediately 
derivable. 

Before completing our discussion of the limiting distributions of statistics 
occurring in multivariate analysis, we shall state a theorem on limiting distri¬ 
butions which is an obvious generalization of a theorem of Doob, [4, p. 166]. 

Suppose that the random variables g(ri)Xi n , ■ , g(n)X Pn have a distribution 

function F(g(n)Xu , • ■ • , g(n)X pn ) which is such that 

lim F(gin)X in , • ■ •, g(n)X P „) = F(X i, ■ • ■, X v ), 

n~*x> 

where F(X i, ■ • • , X v ) is a continuous distribution function, and suppose that 
X ltl converges in probability to the real number £,. For example, if x n — 


28 The theory of the Schustei periodogram is given by Fisher [7], 
27 See Wilks, [18, p. 476] or Madow, [15, pp. 481, 484] 
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2 x„/n where E(x,) = 0, E(xl) = 1, and £ is satisfied, then x n converges to 

V 

zero with probability one, and y/nx n has a limiting distribution which is 
normal with zero mean and unit variance, 'i.e. 

lim j P{\/ r n% n < a;} — N{X\ 1) | = 0. 

T) * X> 


Theohem VII, Let <pj(ti, • ■ ■ ,t v ) be a function of h , ■ ■ , i p defined in a 
neighborhood N of £i, • •,£,, which, together with its (kj + l)-th partial deriva¬ 
tives is continuous m N. Suppose that k is the least value of rj such that the 
random variables 28 

fo(n)r'[E (x, n - ’ ^ -] <r/> 

have a joint limiting distribution function D(x i, ,x,). Then the random 

variables {g{ri)f ! [w{xi n , • , x pn ) — <pj{£i , • • • , | P )] have a joint limiting distri¬ 
bution which is given by D(x\, • , x,). The value k( is greater than or equal,to 
the minimum value for which not all the partial derivatives of order k/ vanish at 

i * ■ ■ i 

The proof is almost word for word that of Doob, the only difference being 
the removal of the specializing words. 

We now consider the limiting distribution of the ratio of genei alized sums of 
squares L n which is defined by 


In = 


Dl 

Dbi 


where D " + 1 is the determinant of the forms b”,k + b^i = b? } . It has been 

shown that 20 


L = TT Xi± 

" V TThi’ 

where Y", , (j = k, k + 1), is a ratio of generalized sums of squares 

(r,s~ 1, • • •, i) u, v = 1, • •., t - 1; bow = D> 


v n _ I ^ >r ’’ I 

Jii - 


ib: 


uvj I 


Since Y,,jm ]n converges with the probability one to | <r„ |/| a uv |, and since, 
by Corollarv VIII the joint limiting distribution of the m*+i „ IS 


58 See Goursat-Hedriok, [10, p 107] for a statement of the Taylor expansion of functions 
of several variables, which wc Use here, by — ij v ) ; g m0an (; the value of 

3 *>/(*i , , x p ) 


9f. 


9t, 


at the point £i, .. , £ p . 


81 See Madow, [15, p. 485]. 
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II G(x t ; m, 1) it follows, by Theorem VII, that the joint limiting distribution of 

l 

the ratios of generalized sums of squares 

Ya 


is 

II G(x, ; iirti , 1) 

t 

and that the limiting distribution of wu+i „(1 — L n ) is'" 1 

G(x] prrh , 1). 

In a following paper, these results will be extended to quadratic forms in 
non-central random variables. 

5. Summary. In Section 2, Theorem I, we stated a very general foim of the 
Laplace-Liapounoff theorem based on the Lindoberg condition. In four corol¬ 
laries, this theorem was shown to provide joint limiting distributions for sys¬ 
tems of linear forms which are such that the maximum of the absolute values 
of their coefficients converge to zero with an increase m the size of the sample 
if the coefficients are constants, and converge in probability to zero with an 
increase in the size of the sample if the coefficients are themselves random 
variables. It was shown that under certain conditions functions of several 
random variables, which arc such that each function is a linear function of 
certain random variables for fixed values of random variables of lower index, 
also have a noimal multivariate limiting distribution. 

These results were extended to include limiting distributions of quadiatic 
and bilinear forms in Section 3 The method of extension was to show that 
necessary and sufficient conditions for the existence of systems of linear forms 
satisfying the conditions of Section 2 are provided by rather simple conditions, 
the most important of which is that the greatest of the absolute values of the 
elements of the matrices of the quadratic and bilinear forms approach zero if 
the size of the sample increases, the ranks of the forms remaining unaltered. 
This led to the theorem that quadratic and bilinear forms having such ma¬ 
trices have x, or covariance, or Wishart’s distribution as limiting distributions. 
It was then shown, in Theorem IV, that if the rank of the sum of the matrices 
of the quadratic and bilinear forms is equal to the sum of the ranks of the ma¬ 
trices, and if certain of these ranks do not change as the size of the sample 
mcieases, then the system of quadratic and bilinear foims have Wishart’s 
distribution in the limit provided the othei conditions arc met. These results 

30 A generalization of Wilks’ result, [19, p, 323] to the case where the variates aie not 
assumed to have a normal multivariate distribution may readily be obtained. 


n 

ft -1 
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were then extended in Theorem V to one of the eases occurring when the coeffi¬ 
cients of the forms are themselves random variables. 

Several simple illustrations of the uses of the methods were given in Section 4. 
It was shown that the analysis of the variance ratios, and statistics occurring 
in the theory of multivariate statistical analysis have the same limiting distri¬ 
butions which they would have had if their variables had been normally and 
independently distributed. 
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ON A TEST WHETHER TWO SAMPLES ARE FROM THE SAME 

POPULATION 1 

By A. Wald 2 and J. Wolfowitz 

1, The Problem, 3 Let X and F be two independent stochastic variables 
about whose cumulative distribution functions nothing is known except that 
they are continuous. Let xi, xz , • • , x rn be a set of m independent observa¬ 
tions on X and let yi , • • , y n be a set of n independent observations on F. It 
is desired to test the hypothesis (the null hypothesis) that the distribution 
functions of X and Y are identical. 

An important step in statistical theory was made when "Student” proposed 
his ratio of mean to standard deviation for a similar purpose. In the problem 
treated by “Student” the distribution functions were assumed to be of known 
(normal) form and completely specified by two parameters, It is clear that in 
the problem to be considered here the distributions cannot be specified by any 
finite number of parameters. 

It might nevertheless be argued that by virtue of the limit theorems of 
probability theory, “Student’s” ratio might be used in our problem for large 
samples. Such a procedure is open to very serious objections The popula¬ 
tion distributions may be of such form (e.g., Cauchy distribution) that the limit 
theorems do not apply. Furthermore, the distributions of X and Y may be 
radically different and yet have the same first two moments; clearly “Student’s” 
ratio will not distinguish between two such distributions 

The Pearson contingency coefficient is a useful test specifically designed for 
the problem we are discussing here, but one which also possesses some disad¬ 
vantages. The location of the class intervals is to a considerable extent arbi¬ 
trary. In order to use the x distribution, the numbers in each class interval 
must not be small; often this can be done only by having large class intervals, 
thus entailing a loss of information, 

2. Preliminary remarks. Denote by P{X <x] the probability of the rela¬ 
tion in braces. Let f{x) and g(x) be the distribution functions of X and F 
respectively; e.g., P\X < x) = f(x). Throughout this paper we shall assume 
that /(*) and g(x) are continuous. 

Let the set of m + n elements x \, • • • , x m and yi , • ■ , y n be arranged in 


1 Presented to the Institute of Mathematical Statistics at Philadelphia, December 27, 
1939 

2 Research undei a grant-in-aid from the Carnegie Corporation of New York 

2 The authors are indebted Lo Prof. S, S. Wilks for proposing this problem to them. 
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ascending order of magnitude, and let the sequence be designated by thus: 
Z = zi, Zt, • • , Zm+n, where z x < z 2 < ■ < s m+n . (f(x) and g(x) were 

assumed to be continuous. Hence the probability is 0 that z t = z t+! and there¬ 
fore we may exclude this case) Let V = v x , v 2 , • ■ • , « m+ , be a sequence de¬ 
fined as follows v t = 0 if z, is a member of the set x x , ■ ,x m and r, = 1 if z { 
is a member of the set y x , ■ ,y n . It is easy to show that any statistic S 
used to test the null hypothesis should be invariant under any continuous, 
reciprocally one-to-one transformation of the real axis That is to say, if 
t' - <p(t) is any such transformation, then 

(1) S(xi, ■■■ ■■ , y n ) = ■ < ■ , <p(y n )). 

The reason for this requirement on S is the fact that the transformed stochastic 
variables X' = ip(X) and Y 1 = y>(F) are continuous and have identical distribu¬ 
tions if and only if X and Y have identical distributions. Hence S must be 
a function of V only, with the added restriction that S (7) = S(V'), where 
V' = v m+n , Vm+n-i, ■ • , Vi . For if S were a function of x x , ■ ■ , x m , 
yi , ... , y n which cannot be expressed as a function of V alone, then there 
exists a continuous reciprocally one-to-one transformation t' — <p(t) such that 
(1) is not true. On the other hand, any continuous reciprocally one-to-one 
transformation of the entire line into itself is monotonic and hence either leaves V 
invariant or else transforms it into V' 

3. Previous results. In an interesting paper on this problem W. R. Thompson 

[1] proceeds as follows. Let the sets x x , ■ ■ , x m and y x , ■ ■ , y n be ordered in 
ascending order of magnitude, thus: x Pl , x Pl , • • , x Pm and ?/ P ; , y v \, ■ • ■ , Vri 
where x P1 < x P2 < • • • <x p „ and y p [ < y P ' 2 < ■ ■ ■ <y P ' n . Let P{x Pk < 2/ pV ) 
denote the probability of the relation in braces under the null hypothesis (/(x) = 
g(x)). This probability is shown to be independent of fix) and the relation 

(2) R{x Pi 2/p'*' 1 = n, fc, /c) 

holds, where the right member, which is given explicitly by Thompson, is a 
function only of the arguments exhibited. To make a test of the null hypothesis 
with, say, a 5% level of significance, this writer proposes to choose k and k' 
so that \p{m, n, k, k r ) = .05. The test would then consist of noticing whether 
x Pk < Vi >v or not. In the former case the null hypothesis is to be considered 
as disproved. 

It is clear that this test cannot be very efficient, ignoring as it does so many 
of the relations among the observations. Except under certain rather narrow 
restrictions on the admissible alternatives, for example, that g(x) = /(x + c), 
where c is an arbitrary constant, the test suffers the further defect of not being 
“consistent” in a way which will be discussed below. Hence the test suggested 
by Thompson can scarcely be regarded as a satisfactory solution of the problem. 
This criticism, of course, does not apply to those sections of Thompson’s paper 
which deal with the question of estimating the so-called normal range 
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4. The statistic U. A subsequence t> 8+ i, v s+t , ■ ■ 
also be 1) will be called a “run” if «, +1 = v a+2 = 
when s > 0 and if u, +r ^ v, +r+ i when s + r < m 


, v s+r of V (where r may 
• ■ = y s+r and if v, ^ v, + i 
+ n. For example, V = 


1, 0, 0, 1, 1, 0 contains the following runs: 1; 0, 0; 1, 1; 0 The statistic 4 U 
defined as the number of runs in V seems a suitable statistic for testing the 
hypothesis that /(x) = g(x). In the event that the latter identity holds, the 
distribution of U is independent of f(z). A difference between f(x) and g(x) 
tends to decrease U. U is consistent in a sense winch will be discussed below. 

In order to derive the distribution of U under the null hypothesis, we first 

^ ^ 1 (= m+n C m ) possible sequences V have the same 


note that all the 


probability 


mini 

( mini \ 
(m + n )!/ 


(i = 1,2, • ■ • , m) and «,• = 1 (i = m + 1, m + 2, 
probability of the sequence is 


To see this, consider the sequence V where y, = 0 

•,?» + »). Clearly the 


_ ' m{m — 1) • ■ 1 -n(n — 1) • ■ • 1 _ 

® (m + n)(m + n — 1) • • • (n + l)n(n — 1) • • • T 

Furthermore, the probability of any other sequence is equal to the product of 
the factors in the numerator of q taken in a different order, divided by the 
product of the factors in the denominator taken in the same order. The quo¬ 
tient is, of course, = q. 

Let eo be the number of runs in V whose elements are 0 and let ei be the 
number of runs whose elements arc 1. Obviously U — e 0 + ei. Let the runs 
of each kind be arranged in the ascending order of the indices of the v ,. Let r 0 , 
be the number of elements 0 in the j a ' run of that kind [j = 1, 2, ■ , e 0 ) and 

let ri,i be the number of elements 1 in the f th run of that kind (/ = 1,2, ■ ■, e{). 
The following relations obviously hold: 


00 


(3) 

22 Tbi = TO, 


i~l 


Ol 

(4) 

22 = n, 

,'=i 

(5) 

1 < e 0 < to, 1 < 

(6) 

| e 0 - ej | < 1. 


4 When this paper was already in proof, oui attention was called to a paper by W. L. 
Stevens, entitled “Distribution of groups in a sequence of alternatives,” Annals of Eu¬ 
genics, Vol. 9 (1939). There a statistic, which ib essentially the U statistic, is proposed 
for a problem different from that considered by us and the distribution of U is obtained 
in a different manner. However, the application of the U statistic for the purpose herein 
described, the proof of consistency and the other results of our paper are not contained 
in it. 
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Hence if U = 2k, then e 0 = ei = k, and if U — 2k — 1, then either e 0 = k, 
e t = k — 1 or e B — k — 1, ei = Jc The element t>i of V together with the num¬ 
bers r 0 i, r 02 , ■■■ , ro. 0 , rn, r n , • ■ ■ , r ltl , completely determines the sequence V 
whose probability is q. 

Without loss of generality we may assume that m < n. If U = 2k, 
1 < k < m, v\ = 0, any two sequences of k positive numbers each may consti¬ 
tute a sequence of r 0 i, • ■ , ro„ 0 , r n , ■ ■ , provided only that (3) and (4) 
are satisfied. The number of sequences r n , r 02 , • ■ , which satisfy (3) is 
the coefficient of a m in the purely formal expansion of 

(o + a 2 + a 3 + • < ■)* — ^ 


and hence is . Similarly the number of sequences rn , rig, • ■ • , r lk 

which satisfy (4) is found to be n ~ l C k -\. Bearing in mind the case U — 2k, 
V\ — 1, we obtain 


(7) 


P{U = 2k) 


m+n C m 


(fc = 1, 2 , ... ,m), 


where the left member denotes the probability of the relation in braces under 
the null hypothesis. In a similar manner we obtain 


( 8 ) 


p = [U = 2k ~ 1} = 


tm—ln n-lj-i i m-ln n~ln \ 

V Ofc-r U/fc-2 T It-2- L/c-i) 

m +„c m ' 


(k = 2 , . • •, m + 1 ), 


with the proviso that 0 C& = 0 if a < b. 

We shall now briefly indicate a method of obtaining the mean E(U) and 
variance «r 2 ( C/) of 17. For example, E(U ) may be obtained by performing 
several summations of the type 

m—1 

(9) 

t-0 

It is easy to verify that the expression (9) is the term free of a in the purely 
formal expansion in a of; 

(10) (m - 1).(1 + a) n ~ 2 .a.(l + ~J~\ 


and hence is 

(ID 


(m - 1) m+n “ 3 C n _ 2 . 
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The other summations required for the mean and variance can be carried out 
in a similar manner. We shall omit these tedious calculations. The results are: 


( 12 ) 


E(U) = 


2 mn 
m + n 


1, 


(13) 


2 _ 2mn(2mn — m — n) 

a (to + n)*(m + n — 1)' 


The critical region for testing the null hypothesis on a level of significance p 
is given by the inequality U < u 0 , where u a is a function of m and n such that 
P{U < wo} = P- 


5. The asymptotic distribution of U. Let m/n = a, a positive constant. 
Then, as m —> °o, 

2m 


m) 


Au) 


1 + a’ 
4am 


(l + a) 3 ‘ 

Theorem I. If t is any real number, the probability of the relation 

U < 2?>l _ + 2 am -r-. t converges uniformly in t to 
1 + a L(1 + «) 3 J 


r 


■\/2v J-a 




dw 


as m —» oo. 

The proof of this theorem is essentially the same as the classical proof that 
the binomial law converges to the normal distribution (see, for example, Fr6chet 
[2], p. 89) and, it will be unnecessary to give the details. Since the asymptotic 
distribution of the subpopulation of even U is the same as that of odd U, it 
will be sufficient to consider only the right member of (7). Let m' = m — 1, 
n' = n — 1, and k' = k — 1. We make the substitution 


k' - 


m 


(14) 


u> = 


1 + at 1 


■y/mf 


where a' = 


m 


(15) 



and evaluate the factorials by Stirling’s formula. We shall give here only the 
results of successive simplifications. At each step we shall omit the factors 
free of k or w, since their product may be reconstructed from the final expo¬ 
nential form. Thus instead of the right member of (7) we can consider the 
expression: 

(16) • n-1 C*_i. 
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, we get 


Omitting factors free of k, 

(17) (it - 1) 1 (m - k) ! (iSr -1) 1 (» -Icjl 
and by Stirling’s formula, since k and m are both large: 

(18) ^' +1 V -k') lm '' k ' +i \n' - k') (n '~ k ' +i) m 
Now apply (14). We obtain 


(19) 


(Vm'w + 

J 


mV 

V - - Tit’ a* 1 

1 + a', 

) 

m' 

\ r-7 *>' 1 

WW ™- a , ( 1+a ,j-2 

(1 + a') 

\) 

m! 

«'(! + 

— , respectively. 


iy\! 

Dividing inside the parentheses by —,, 

> 1 — p OL 

and again omitting factors free of w, we get 

( 1 + <1 ~ (1 + 

(20) ^ ' “'V™' / 

... U) , V? 

Taking logarithms, expanding in powers of and neglecting terms in ^ 

and higher orders, the results are 

/ ,- WJ,V l \/(1 4 - /v'b/! (1 -L «'Y 




-a' A Vm 7 2 m' / 

J _ »»'«' _ lV(l + a')w (1 4- a') 2 W 2 \ 

1 + a' 2/\ a!y/rn' r 2«'W / 

V ^ m> _ lV<*'(1 + «0 W . “ ,2 (1 + at'fw 2 \ 
«'(1 +«') 2/\ Vm 7 2m' ) 

It . . r. 


which equals 

(22) -gtt + *T + 0(m'- J ). 

2a 

The proof of the fact that the distribution of w converges uniformly to t] 

normal distribution with zero mean and variance -r-, —-— tt* can be carried o 

(1 + « r 

in the same way as the classical proof that the binomial law converges to t 


in the same way as 
normal distribution, 
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It is obvious that 


has the same distribution as w. From this and from the fact that V = 2k or 
2k — 1 Theorem I follows. 

In using conventional tables of the Gaussian function to make tests of sig¬ 
nificance on U when m and n are large, the reader is urged not to forget that the 
critical region of U lies in only one tail of the curve. 

6. An example. We give here a simple example illustrating the use of the 
statistic U and Theorem I. 

Suppose 50 observations were made on X and 50 observations on Y Suppose 
further that these observations are arranged in ascending order and that the i th 
element of this sequence is said to have the rank i. The observations on X 
occupy the following ranks: 1, 5, 6, 7, 12, 13, 14, 15, 16, 17, 19, 20, 21, 25, 26, 
27, 28, 31, 32, 38, 42, 43, 44, 45, 50, 51, 52, 53, 54, 56, 57, 58, 62, 63, 64, 65, 
68, 69, 75, 79, 80, 81, 86, 87, 89, 90, 91, 93, 94, 95. 

The observations on F occupy the remaining ranks. 

In this case, U = 34. 

For m — n = 50, 

E(U) = 51, 
a{U) = 24.747. 

The probability of getting 34 runs or less when the distribution functions of X 
and Y are continuous and identical is therefore less than 5-10 . 

7. Consistency. We shall say that a test is “consistent" if the probability 
of rejecting the null hypothesis when it is false (i.e., the complement of the 
probability of a type II error, cf. Neyman and Pearson, [3]) approaches one 
as the sample number approaches infinity. In the literature of statistics a 
function of the observations which converges stochastically to a population 
parameter as the sample number approaches infinity, is called a “consistent” 
statistic. If a test of a hypothesis about a population parameter is made by a 
proper use of a consistent (statistic) estimate of the parameter, the test will 
be consistent also according to our definition, which thus furnishes an extension 
of the idea of consistency to the case where the alternatives to the null hypothe¬ 
sis cannot be specified by a finite number of parameters. 

It is obvious that consistency ought to be a minimal requirement of any good 
test. It is the purpose of this section to prove that, subject to some slight and 
from the practical statistical point of view, unimportant, restrictions on the 
distribution functions, the test furnished by the statistic U is consistent. 

We shall say that the distribution functions f(x) and g(x) satisfy the condi¬ 
tion A, if, for any arbitrarily small positive 5, there exist a finite number of 
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closed intervals, such that the probability of the sum I of these intervals 
is > 1 — 5 according to at least one of the distribution functions f{x) and g(x), 
and such that f(x) and g(x) have positive continuous derivatives f'(x) and 
g'{x) in I. 

In all that follows, although m and n are considered as variables, their ratio 
m/n is to be a constant, denoted by a. Let /9 > 0 denote the level of signifi¬ 
cance on which the test is to be made, so that, if f(x) = g(x), 

(23) P[U < u<,(m)) = /J 

where the critical region for two samples of size m and n, respectively, is given by 

V < Ua(m). 

Theorem II. If f(x) and g{x) satisfy condition A, and if 

(24) f(x) ^ g{x), 
then 

(25) Lim,P{{7 < ita(m)\ = 1. 

771 —* OC 

The proof of this theorem will be given in several stages. 

Let gj and <r 2 ^ ;/; g^j denote the mean and variance, respectively, 

of — , when X and F have the distribution functions f{x) and g(x), respectively, 

7Tb 

and the sample numbers are m and n. Let the set Xi • • • x m ; yi ■ • • y n be 
arranged in ascending order of magnitude, thus: 

(26) Z = Zl , Zt , ' • ’ , Zm+n i 
where Zi < 2s < ■ • ■ < z m+ „. The sequence 

(27) V , t's , * ■ , 

is defined as follows: r, = 0 if z, is a member of the set x\ ■ ■ x m and v, = 1 
if z, is a member of the set Vi ■ y n . 

Lemma 1. If the follomng are fulfilled: 

a) f(x) s 0 x < 0, 

f{x) = x 0 < x < 1, 

f{x) si x > 1. 

b) g(x) = 0 x < 0, 

g(x) = 1 x > 1. 

c) The derivative g'{x) of g(x) exists, is continuous and 'positive everywhere in 
the interval 0 < x < 1. 
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d) k is an arbitrary but fixed 'positive integer. For every m, i\ m < < 

■ • ■ < ikm are a set of k positive integers subject only to the restriction that the 

least upper bound 7 of the sequence ——— is less than 1. 

m n 

Then the expected value 




satisfies the inequality 
(28) 


V-i / 1-1 <*+0 (a\, m 


< <p(m) 


where X„„ - - ^ ^ and a\ im (j = 1 ... k) is the root of 

(29) rm\ )n + ng(a M J = X jm (m + n) 
and <p{ni) depends only on m and is such that 

(30) Lim (fi(rn) = 0. 


It is easy to verify that the root a Xjlll of (29) exists and is unique. 

Proof: It will be sufficient to show that, for any specified set of values of 


" ' ' ^*<r-1 )h> J ^’(r+1 >m " " ‘ (*" — 1 • • • fc) 


the conditional probability P\v irm ~ 1) of the relation in braces satisfies the 
inequality 


(31) 


g'(<Q 

a + g'{a\ rm ) 


P{V'rm = 1 ) 


< \Km), 


where \p{m) depends only on m and is such that 


(32) 

For each m let 


Lira \p(m) = 0. 

m —*0 


(33) 


-rj-t t / 

Vtn “ V tlm , Vi 2 


f f 


/ 


V 


x km 


be a fixed sequence whose elements are either 0 or 1. We shall consider the 
conditional probability P{t\ rra = sj, (s = 0, 1) of the relation in braces subject 
to the condition that 


(34) = v[ lm , (j = 1, 2, • • ■ (r - 1), (r + 1), (r + 2), • • • k). 

Let a and 6 be two numbers such that 0 < a < b < 1, and let m* be a non- 
negative integer such that m* < m, and m* < [7 (m 4- n)] where [7 (m + n)] 
denotes the largest integer < 7 (m + n). Let Q m (a, b, in*) denote the proba- 
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bility that, if m* observations are made on X and [y(m -f »)] — m* observations 
are made on. Y, the following conditions will be fulfilled: 

(a) the total number of observations < a'is exactly i, m — 1 

(b) all observations are < 6 

(c) if the [y [m + n)] observations are arranged in ascending order and if 
v* =s 0 or 1 according as the j th element is an observation on X or on Y, then 


(35) 

and 

* / 

N.» = V *tm 

0 = 1,2 ,-.., 

r - 1), 

(36) 


O' = r + 1, r + 2 

■ ■ - fc). 


It is easy to see that the probability Po of the simultaneous fulfillment of the 


relations 

(34) and of v lrm = 0 is given by 

(37) 

Po - /YE R ™( a > h > - b) m, ~ l ( 1 - g(b)) n ' dadb 

"0 Jo m• 

where 


(38) 

Rm(a, b, m*) = m C m . nn _ w . ^ (a, b, m*), 

(39) 

m' = m — m*, 

and 


(40) 

n' — n — [y(m + n)} + m*. 


Similarly, the probability Pi of the simultaneous fulfillment of the relations 
(34) and of t u rn — 1 is given by 

(41) Pi - ft L R m (a, b, m*) n'g'{a) (l - b) m '( 1 - gib))"" 1 da db. 

Jo Jo in* 

Then 


f - Q1 = p« 

PK m = 1 } Pi 


Let «o = 2 and nu> = m + « — [y(m + n)] — no . The variables 

fe™ “ d\ rm ), (%cm+n)i - a y ), g(a')) ) converge stochastically to 


zero. 

Let Fo(«) and Fi(e) denote the values of the right members of (37) and (41), 
respectively, if the integration is restricted to the region where a < b, 
I o — a \,„ I < c, | b — o T | < € and the summation is restricted to those values 
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of to* for which ~ — < t , Hence, because of tin' alorementioned 

n (1 - g(a y )) 

stochastic convergence, for all sufficiently laige to 
(43) I P.(e ) - P s \ < * s = 1 , 2. 

Since Pn > 0, for sufficiently laige m, also 

P.W Po , 

m m -jr <«. 

Since g{x) and g'(x) arc continuous in the lnleival [0, )] and lienee uniformly 
continuous, it is clear that 

(45) Pi(.) /(<0 <Ce ’ 

where c is a fixed constant independent of to. From (44) and (45) it follows 
easily that, for any arbitrarily small d , 

(46) ~ - - t7 — I < *' 

Pi 9 farm) I 


(46) 

for sufficiently large to. 


Since P{Vi rm = 1) = ■=—“ , the required relation (31) follows. This oom- 
io + Jl 

pletes the proof of Lemma 1. 

Lemma 2. If conditions a, b, and c of Lemma 1 air satisfied, then 


(47) 
and 

(48) 

Proof: Since 


Lim E (- ; /; g) - 2 / 

rn-« \m ) Jo a + g [ 

Lim <j 2 J; A = 0 

rn-+w \Ttl / 


(49) 

= 1 + 

we have from Lemma 1, 

(50) ^ ml, a 


U 1,1 , y. 

= _ + _ [v, - n,-i) 

w to m j =2 


„1+JL±±2 + 1”%\-1ZW,, 

m j ®2 — . 


O T II+H 

-£ 

Ub j =®2 


(-) = - 

V 9'fam) __ v 1 

^ ff'(0)m) V 

\mj m 

a + g'{a, m ) T 



_ 2 y ag 1 (a ]m ) 
m l(ot + g'(Qjm))'P 


+ r)(m) + , 
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where 

(51) 


Lim ij(m) = Lim vt(v) = 0 

w—»oo y-+l 


and a /n is the root of the equation 

(52) ma jm + ng(a jn ) = j 0 = 2 ■ • • m -f- n). 

From equation (52) it follows that 

(53) Lim (o, m — a^\) m ){m + ng'(a , m )) = 1 

uniformly in j. Since 7 may be chosen arbitrarily near to 1, the required 
result (47) follows easily from (50) 

It remains to consider the variance of —. The expression 

m 


1 + Vl + tlm+n . 2 


m 


m + n — 1 

+ - Ti Vj 

m ,=2 


2 1 

differs from - by at most — , so that its variance converges to zero with m—> 00 , 
a m 


In order to prove (48), it will be sufficient to show that the variance of 

w+n 

(54) 


1 in - r n 

W = - D v,-iv, 
m 


goes to zero with increasing m. From Lemma 1 it follows that 

(55) - z(m ) < [E{v,VjV k v e ) - E(viV,)E(v k Vc)] < z(m), 

where Lim | z(m) | = 0, provided only that the integers i, j, k, l are distinct 

77J—» OO 

and < 7 (m + n ), The variance of mW is the sum of terms of the type occurring 
in (55). The number of terms for which i, j, k, l are distinct is of the order m! 
All other terms are of size at most 2 and their number is of the order m. Since 
the number 7 may be chosen arbitrarily near to 1 , the variance of W converges 
to zero with m —» «. 

This proves Lemma 2 

Lemma 3. If conditions a, b, and c of Lemma 1 are fulfilled, and if (24) holds, 
then 

f 

a 


(56) 


T 


+ g 1 (*) 


dx < 


1 


1 + a 


Let ai < a s be any two real numbers and designate — —- — by a 2 . Let 

2 

F(x) be defined as follows: 

F(af) = 0, 

F(x) = {x - o,)b, -f F(a<), 


(57) 


(o, < a: < a< + i ;»=»!, 2). 
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Let c be defined by 


(58) ^(«j) = c(a 3 - ai). 

Then it is easy to verify that the maximum of 


(59) 


T* — 



F'(x) 
a + F’(x ) 


with respect to hi and h 2 , subject to the restrictions that hi and h 2 be non¬ 
negative, and that fti, n s and c be fixed (c > 0), occurs when and only when 


(GO) 


hi = ^ = c. 


Now define 


(61) 


and 


= o, 

7 _ <7 (^;) “ g(P^-Di) 

l --—- 


5.= 


1 y U, 

2’ .-1 a l„ ’ 


(»- 1,2, 2';j = 0, 1,2...)- 


Repeated application of the result of the picvious paragraph easily gives 


(62) 

S, > S J+ i. 

From (24) it follows that there exists a positive integer j' such that S,< > S 

Obviously 


(63) 

& = r i~ 

1 + (X 

and 


(64) 

Lim Sj = T . 

7 —► oC 

Hence Lemma 3 is proved 



Proof of Theorem II' Lot 5j > 5 2 > ■ • > 5, > • ■ - be an arbitrary but fixed 
sequence such that lim 5, = 0. Foi 8 = 8,, let, h , ■ • , hi.,) be a set of closed 
intervals such that no two intervals have an interior point in common and 
within which, by condition (A), /'(r) and g'{x) exist, are positive, and con¬ 
tinuous Let la, be the complementai y set (with respect to the whole line). 
(It is easy to see that, if condition (A) is fulfilled, such a system can be con- 
stiucted.) Let, l\(i = 1, 2 •• k(j) and denote, respectively, the runs 
caused by the obsei vat,ions which fall m the intervals I , , la,. Then 

*ii) 

u - Z u, - ih, 

i=>i 


(65) 


< 2(fc(j)). 
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From condition (A) it follows that, with a probability arbitrarily close to 1, for 
sufficiently large in, 

( 66 ) Ua, < 3 pmdi, 


where 


p ■= max 



O' = 1, 2 ■ ■.). 


Lot [a, < a- < bj], i - l ,2 • • denote the interval 7,, and let w, and n { denote 
the number of observations on X and Y, respectively, which fall in the interval 

7,, Then — 1 and — converge stochastically with increasing m to [ f(b ,) - f(a ,)] 

171 71 

and [g(]h) — g/a,)], respectively 

Within the inteival /,(« = 1, 2 • • k) we perform the transformation 
(67) X* - f(X), Y* = /(F), 

which leaves U, invariant. For fixed m,, n, the relative distribution of X* 
is uniform and the relative distribution of Y’ 1 fulfills condition (c) of Lemma 1. 

Hence from Lemma 2 we obtain that — converges stochastically to 


( 68 ) 


Lim E 

m-tca 



< 2[/(b,) ~ /(a,)][gr(b,) - s(a,)l 

~ to&) - +W(h)~-f(a>)Y 


It can be verified that the sum of the second members in ( 68 ) over all values i 

2 

is less than or equal to —-— . 

1 + a 

From (24) and condition (A) we get that, for sufficiently small 5, , there exists 
at least one interval for which the first mcmhei of ( 68 ) is less than the second 
member. Hence 


(69) 

where 


S < 


2 

1 + a' 


(70) 


= 53 Lim E 

l-=»l III—*03 



Now take j so large that 


(71) 3pS, < e, 

where 


0 < 3< < - S. 

1 + <x 


( 72 ) 
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f T 

Smco — 1 converges stochastically to its expected value, from (65), (66), (70), 
TM 

(71), and (72), it follows that, with a piobability arbitrarily close to 1, for suffi¬ 
ciently laige m, 


(73) 


"< 2 _ 
vi 1 -)- a 


From (23) and Theorem I wo gel 


(74) 


Lim 

in —*oo 


t£o(m) 

in 


2 

1 -j~ a 


Theorem II follows easily from (73) and (74). 


8. Remarks on a proposed test. Wo have already remarked in Section 3 that 
the test proposed by W R. Thompson is not consistent To show this, we shall 
give two distribution functions f{x) and g(x) such that, although these functions 
will be very different, the probability of i ejecting the hypothesis that they are 
the same will not approach one as the sample number approaches infinity. 

Suppose, to simplify the notation, that the observations have been ordered 
according to size, i e , that aii < x 2 < • < x m and ig, < y 2 < < y n . Sup¬ 

pose further than m = n, and that the test is to be made on a level of significance 
0 > 0, In the right member of (2) we need not exhibit n and shall replace 
k and k' by k(m) and fc'(m) to show the dependence on m We have, under the 
null hypothesis, 

(75) P{ xn(„o < yi‘(m i) = i(m, k(m), V(m)) = 0. 


The sequence - is bounded, so that there exists a monotomcally increasing 
m 

subsequence mi , • • of the sequence of integers 1, 2 • • and a number h, 

0 < h < 1, such that 


(76) 


Lim 


ft(m,) 

m, 


= K 


It is easy to see that then also 


(77) 


Lim 

1 —»oo 


mi 


= h. 


We shall now assume that 0 < h < 1. If h = 0 or 1 only a trivial alteration 
will be needed in the argument to follow. Let e and 5 be arbitrarily small posi¬ 
tive numbers. We now consider two populations, A and B described as follows: 

A) f(x) = g(x) = x (0 < x < 1), 

B) f(x) (0 < x < 1), 


g{x) =-- g(a t ) + 


(x — ct,)(g(a,+i) - g(a.)) 

(fli+l u*) 


(a, < % < a, + i; i = 0,1, • • *, 4), 

* 
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where 


Oo = 0 

'ST 

11 

O 

ai = h — 25 > 0 

g{<h) = o 

a<i =• h — 3 

g(as) = a* 

Cta — -f- S <C 1 ~ 5 

g(af) = ai 

oj = 1 - S 

®=a 

p, 

11 

p 

at - 1 

g(af) = 1 


The definition of J{x) and g(x) outside the interval 0 < x < 1 is obvious. It 
will be shown that even for such different populations as A and B and for 
samples of size greater than that of any arbitrarily assigned number, the prob¬ 
ability of rejecting the null hypothesis if B is true will be at most (3 4- e. 

Let hi, hi, k denote the number of observations on X which fall in the 
intervals 0 < x < a 2 , a 2 < % < a s , a s < x < 1, respectively (m fixed, of course) 
Let h[, hi , hi be the corresponding numbers for Y. For a fixed m, the prob¬ 
ability of a set hi, k , h, hi , hi, hi is the same whether the sample be drawn 
from the population A or B, From (76), (77), and multinomial law it follows 
that for all sufficiently large wi, the probability is at least 1 - e of the occurrence 
of a set hi, k, hi, hi, hi, hi for which and will both fall in the in¬ 
terval dj < x < flj. Furthermore it is obvious that for all samples with fixed 
h, hi the distribution within the interval < x < a 3 is the same whether the 
sample came from the population A or B Hence even when the sample is 
drawn from the population B, the first member of (75) is < 0 -j- e. This com¬ 
pletes the proof of the inconsistency of the test based on (75). 

This test is consistent if the alternatives to the null hypothesis arc limited, 
for example, to those where g(x) = fix + c), c a constant, 
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THE SUBSTITUTIVE MEAN AND CERTAIN SUBCLASSES OF THIS 

GENERAL MEAN 

By Edward L. Dodd 

1. Introduction. No general agreement has been reached, so far as I know, 
as to what constitutes a mean. A necessary condition which appears to meet 
with general approval is that a single-valued mean of a set of numbers all equal 
to a constant c should itself be equal to c. However, there appears to be some 
valid objection against imposing any other proposed condition as necessary . 

Of course, intermediacy is a condition that suggests itself at once. Indeed, 
in certain mean value theorems m general analysis—such as the First Theorem 
of the Mean for integral calculus, which I mention in Section 3—intermediacy 
is the main feature. 

However, 0. Chisini [1] insisted that intermediacy or internality is not the 
chief characteristic of a statistical mean. Rather, a mean is a number to take 
the place, by substitution, of each of a set of numbers in general different. 
Such a mean may well be called a representative or substitutive mean. 

Chisini defined m to be‘a mean of xi , x 2 , • • , x n , relative to a function F, 
provided that 

(1.1) F(m, m, ■ , m) = F{x i, x 2 , ■ • • , x„) 

If, for example, 

(1.2) F(x i, Xt , • • • , x„) = 2x? = Sm 2 = nm , 
the mean m thus obtained is the root-mean-square 

(1.3) m = ± {{l/n^x*} 11 *. 

The choice of F, Chisini noted, depended upon the use to be made of the 
mean 

Suppose now that f(x i, x 2 , ■ ■ , a;„) is such a function that one value of 

(1.4) f(x, x, • • • , = x. 

And suppose that this /is taken as a particular F for (1.1) to determine a mean 
m implicitly, thus 

(1.5) f(m, m , • , m) = f(x i , x % , • • ■ , x n ), 

Then, from (1,5) and (1.4) it follows that one value of 

(1.6) f(x i , Xj ,...,*„) = m. 

And thug / determines the mean m both explicitly and implicitly. 

m 
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It should be noted that the F = in (1.2) is not itself a mean of the x t . 

If, in (1 2), we take x x = — 2, x 2 = 1, x 3 = 1, then the double-valued mean 
m — ± 2 1/2 results. Now —2 1/2 is internal ; e.i. — 2 < — 2 1/2 < 1, but 2 1/s is 
external, for 2 1/2 > 1 > — 2. But since here Sa, = 0, it follows also that the 
standard deviation of —2, 1, 1, is the external mean 2 1/2 Chisini [1], indeed, 
used the root mean square to show the possibility of external means. External 
means have been noted by other writers, [2-7], 

It is noteworthy that a number of writers [8-12] have used the condition 
(14) (in general, with / single-valued) as one of a set of axioms to 
characterize particular means. Sometimes, this has appeared in weaker form 
as f(l, 1, ■ • ■ , 1) = 1. 

This paper will be concerned primarily with the mean of a finite number n, 
of variates, an, , • • •, x„ . Possible generalizations will be mentioned briefly 

m Section 8. 

In the conception of the substitutive mean, to, as I have been using it for some 
time, emphasis is laid upon the explicit form for to; and provision is made for 
multiple values. 

Definition of the Substitutive Mean. Let f{x i, x%, • ■, x n ) be a func¬ 
tion of n variables, xi , Xs , ■ • • , x n defined at least for one set of equal values, x , = k. 
If c is any number such that f(c, c, ■ ■ ■ , c) is defined, let one value of 

(1.7) /(c, c, • • , c) = c. 

Then f(x r , Xi, ■ , x n ) will be said to be a substitutive mean of xi, Xs , • •, x„ . 

If an original formulation of a problem does not assign to a function a value 
when the variables are all equal, it is sometimes possible to assign such values 
by continuity considerations, such as are commonly used in the ''evaluation" 
of indeterminate forms. This will be discussed in Section 6 

In the following, when the word mean is used, it will designate the substitu¬ 
tive mean as defined above 

2 . Classification of Means already made. Some general classes of means 
have already been distinguished. One important basis for a classification of 
means is the kind of data to be used. The data may be only qualitatively 
distinguishable. Then numbers may be assigned to qualities For dealing in 
a very general way with all kinds of data, C. Gini and L. Galvani [13], and 
G. Pietra [14], distinguished between data in rectilineal series, in cyclical series, 
and in unconnected series. These three classes are associated respectively with 
the straight line, the circle, and a regular polyhedron (in three dimensions, the 
regular tetrahedron, and in n dimensions, a polyhedron with n 1 vertices each 
at the same distance from each of the other n vertices). 

For one definition, of the arithmetic mean of a cyclical series, Gini uses the 
center of gravity principle, and this mean is computed with the aid of sines and 
cosines. By mechanical means, such an arithmetic mean of dates—for example, 
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of dates of weddings—as days of a year can be found. On the rinl of a wheel 
delicately suspended and marked off for the 365 days or 366 days of a year, let 
small weights proportional to the number of weddings on a day be placed in the 
spaces assigned to the individual days Then when the wheel comes to rest, 
the arithmetic mean of the dates will be found at the lowest point of the rim. 
In the special case where the center of gravity of the system is at the center of 
the circle, the mean is indeterminate, or we may say that every day is a mean 
day 

Also, for cyclical series the arithmetic mean and the median are defined by 
other methods, using such principles as minimizing the sum of the squares of 
deviations or the sum of the absolute deviations. 

The properties of means may be made the basis of a classification, either those 
properties which have been evolved by writers [8-12], [15-18] who have char¬ 
acterized specific means by sets of axioms, or those properties which seem of 
special importance in making distinctions. Two such properties will now be 
mentioned. 

Gini [19] recognizes two large classes of means: “A) medie ferme, B) medie 
lasche,” the latter (loose) class including the median and mode for which values 
do not depend upon all the data. To describe this latter mean m of arguments 
x,, we might write dm/dx t = 0 as applying to several if not most of the argu¬ 
ments over wide ranges instead of at isolated points. 

Subclasses of A or firm means as given by Gini will be discussed in Section 4. 

Another rather large classification distinguishes between simple means and 
their weighted forms. In a case often encountered, where the weights are 
whole numbers indicating frequencies of occurrence this distinction is of little 
significance. In the more general case, however, where weights may give ratings 
of the efficiency of measuring instruments or the weights may be negative [6, 
20 ], more direct attention needs to be paid the weighted forms. 

To supplement classifications already proposed, I am indicating in the next 
section a descent from the substitutive mean, the most general of all means, 
down through two classes of means less general, which I am calling the summa¬ 
tional mean and the quasi-arithmetic mean, to the more specific mean known 
as the associative mean, studied in particular by M. Nagumo, [21} A Kolmogoroff, 
[22] and B. de Finetti, [2]. 

The foregoing subclasses of the general or substitutive mean are based 
primarily on structure, the way the mean is formed. 

3. The Summational Mean, Quasi-Arithmetic Mean, and Associative Mean. 

The summational mean, now to be defined, is a generalization of the weighted 
arithmetic mean. 


■yy _ C \ X \ -f- CjXl c n x„ 

Cl + C 2 + • • • -h c„ 


(3.1) 


Sc, 0. 
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It is to be noted that although W is not a symmetric function of x t , W is a 
symmetric function of c t x,. In the generalization Q, the following features of 
W are retained: 

1 . Certain weights c, being given, Q is a symmetric function of c t x t . 

2. This Q may be determined from sums of n terms, each term involving 
one and only one x x . 

Definition. Let 2 denote a summation for i = 1, 2 , • • ■ , n. Suppose that 

(3 2) F\y, 2fi(c t Xi, y ), 2 / 2 (c.*,•, y), • • • , 2 f h (c,x, ,y )} =0 

has a solution, y = Q which is a substitutive mean of x i, . Then Q 

will be called a summational mean of xi, x s , • ■ ■, x n , relative to the functions fi, 
fa, ■ fk, and F. 

Sometimes it is possible to express Q as 

(3 3) Q = (riSg^c.Zi), Sg 2 (c,a:,), ■ • • , S^(c,a:,)), 

Among summational means, those of most frequent use involve in a special 
way but one summation. Thus with ip(x) a function, which would usually be 
taken as continuous, this m satisfies 

(3.4) V'WSc. = ’Lc i \p(x,). 

But this, with c, > 0 , is just an algebraic analogue or prologue to the First 
Theorem of the Mean for integral calculus~the C{ to be replaced by a positive 
integrable function Without further specification, this mean m may have an 
uncountably infinite number of values. But if it be required that \f>(x) be a 
continuous increasing function, and that c, > 0 , then m is unique. 

In a series of papers, C. E. Bonferroni [20], [23-27] used means such as m in 
(3.4) for statistical and actuarial problems. And, as he had in mind [28] dis¬ 
tinctly the notion of substitution, Tie was in a sense a forerunner of Chisini. 
E. L. Dodd [29] made use of a mean m defined with the aid of n continuous in¬ 
creasing functions thus: 

( 3 - 5 ) 2cfj/ t {m) = c , > 0. 

If gi(x) = c,^,( x), this can be written 

(3.6) Sg,(m) = 2 g,(x,). 

In one paper, C. E. Bonferroni [20], as already noted, used weights which 
might he either positive or negative. 

Some such mean as m in (3.4) has been used by a number of writers. Here 
4>{in) is a weighted arithmetic mean of ^{xi)) and thus it is natural to call m a 
quasi-arithmetic mean of x,. 

Definition. Let 2c< ^ 0. If m is a solution of 

(3-4) 2 c, = 2 c4(x t ), 
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then m will be called a quasi-arithmetic mean of 2 ,, with weights c l , and relative to 
the function f(x). 

Sufficient conditions for the existence of this mean m are: (1) That f{x) be 
continuous in the interval I, finite or infinite, in which the observations 2 , lie; 
(2) That either c, > 0 for each i, or that fix) take on all real values, as x runs 
through I. 

It will be helpful to picture geometrically the double transformation or mirror¬ 
ing represented by (3.4). Points x, on the horizontal axis are carried vertically 
to the curve y = fix) and then reflected horizontally to the y axis. For the 
points j/,, on the y axis thus obtained the arithmetic mean y or “center of 
gravity” is obtained. Then y is carried horizontally to the curve and reflected 
vertically to the 2 -axis. The abscissas m of points on the 2 -axis thus obtained 
are means of the given 2 ,, relative to this f(x). 

It may happen (Dodd [3 p. 746]) that the curve y ~ f(x) contains horizontal 
segments, as in Lhe curve for temperature y of ice-water-steam which has ab¬ 
sorbed a quantity x of heat. In this case the mean m may be an “interval,” 
an uncountable set of real numbers. Indeterminateness over an interval is a 
well known feature of the median of an even number of variates. In fact, a paper 
of D. Jackson [30] was for the purpose of indicating one method of selecting a 
single value from this interval of indeterminateness, as a median. 

It may be noted that a mean of n variables becomes, when n = 1, a function 
of a single variable; and thus it appears possible to implant in a mean of n 
variables almost any peculiarity found in a function of one variable. 

A special case of the quasi-arithmetic mean is the associative mean m which 
under some general conditions has been shown [2, 21, 22] to satisfy 

(3.7) nf{m ) = 2 f(x t ), i ~ 1, 2, . • ■ , n; 

where f{x) is a continuous increasing function. 

If U(x i, *2 ,■••,*„) is an associative mean, then by definition, f n (xi , 
2 j, • • • , 2 n) is unaltered when any k of the n variates are each replaced by the 
mean /*, of that set. 

4. The Gini means as summational. Having distinguished firm means from 
loose means, Gini [19] noted that in the former class, a variate might appear as 
a base, as an exponent, or both as base and exponent. In general, these variates 
are to be positive. Gini then listed ten means of a decidedly broad character, 
some of them generalizing the combinatorial means treated by A. Durand [31] 
and 0. Dunkel [32]. See also G. Pietra [37]. 

These ten means involve only the four simple arithmetic operations and root 
extraction. For many purposes they are best expressed in the form given by 
the author. However] to show that these means are summational, logarithms 
will be used to reduce products to sums. 
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Let 

S p = Zxf i = 1,2, , n ; 

n C a = n]/c\(n — c)!, a binomial coefficient; 

P e be any one of the „C c products of c different elements taken from 

(4.1) Xi, x t , ■ • - , x n ; 

Pc = (Pt) v , the p t] ' power of P c ; 

Z o = 2 P e , the sum of all the n C c products P c ; 

Z p = SPf. 

In the expressions which follow, it is assumed that the denominators are not 
zero. 

The ten means, as defined in Gini's Equations I, II, ■ , X, will be designated 
here by mi, m 3 , • • • , ; and their logarithms, with base arbitrary, will now 

be given. 

log mi = (log S p — log n) /p 
log = (log Z c - log n C c )/c 
log m 3 = (log Zl - log n C c )/cp 
log ?ni - (log S p - log S q )/(p - q) 
log m 6 = Xx p log x,/S p 

(4.2) 

log rm = (log Z c - log Z d - log n Ca + log n C,i)/(c — d ) 
log m, = (log Zc — log Zd - log nC 0 4- log „Cj)/(c — d)p 
log = (log Zf - log Z\)/c{p - q) 
log m 9 = SP . 4 log Pc/cZ’i 

log m« = (log Z p — log Zj - log nC c + log n Cd)/(cp — dq ). 

As noted by the author, the foregoing include some well known special means. 
Thus, is the power mean, which for p = 1 , 2 , — 1 , becomes respectively the 
arithmetic mean, the root mean square, and the harmonic mean. If p —> 0, 
then the limit of m 3 and of mi is the geometric mean. If p = 0, 1, 2, and q = 
P ~ 1) then mi is respectively the harmonic, the arithmetic, and the contra- 
harmonic mean. 

For each of the ten means, Gini gives an appropriate name. Those involving 
binomial coefficients are combinatorial, a mean like the contra-harmonic with 
denominator other than a constant is biplanar, the more simple means 
monoplanar. 

When in the following, I show that certain combinatorial expressions may be 
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replaced by sums, it is not implied that this replacement would simplify 
computation. 

To prove that mi , rm , ■ ■ • , mm are all summational means, it may be noted 
that n, p, q, c, d, n C c , and n Cd are constants. Moreover, S p is the symmetric 
sum of the pth powers of x, , thus with only one x , in each term, and 
i = 1, 2, ■ •, n. And, since Z c , Zf , , and Zl are symmetric polynomials m 

the x ,, they may be expressed as polynomials in S\ S~, ■ • , by a well known 
theorem of algebra. Hence among the ten means, the only one that requires 
special attention is the ninth mean, mo . 

To show that ms is a summational mean, we need only examine the numerator 
of the right member. Let this numerator be N. 

(4.3) N = 2 Pi log P c . 

Then 

(4.4) qN = (x\x\ . . a;®) (log *?+•■•+ log *?) + • 

Thus, if we set y, — xi , we may write 

(4.5) qN = (y x yt • • • y c )(log yi + . + log y c ) + • • • . 

The coefficient of log ?/i in this right member is the sum of all products of c 
different factors which include y x . 

Now, let Y r be the sum of the products of r different factors taken from 
yi, V 2 , - • , y n and let T r be the sum of the products of r different factors 
taken from y*, yz, • • • , y n . Then it is evident that 

(4.6) Y r = T r + yiT T -i ; T r = Y r — r/i^V —i • 

If, now, we set Y 0 = 1, it follows that 

(4.7) T 0 _i = Y c -1 - yiYc -2 + ylYc-j - • • + (-1 Y'Vr'Yo . 

Hence, in qN, the coefficient of log yi is 

(4.8) yiTc-i = yi Y c -i - ylY c -2 + • ■ ■ + (-lrViF.. 

Thus in qN, the terms containing log yi are 

(4.9) Yc-iyi log yi - Yc-tyl log y x + • • + (- l)yl log yi . 

Now let 

(4.10) U r = log y,, i = 1, 2, • • ■ , n. 

Then, 

(4.11) qN = Y c -Jh - Yc-iUt + • • ■ + { — l) c ~ l YoU c . 

Thus, qN is here constructed from sums of n terms with but a single ?/,• in any 
term. 

Likewise, with y t replaced by x\, a term contains but a single x,, 
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6. Transformations. A function f(x 1 , x 2 , ■ • ■ , x n ) is not in general a mean 
of its arguments Xi. However, it is often possible to make a substitution 
x, = 4>(Vy) so that 

(5.1) f[4>(yi), 4>{yi), ■■■, <j>(y n )] = g(yi, y >, • ■ ■, y n ), 
is a mean of its arguments y ,. 

The required substitution is sometimes obvious, as in the case of the estimate 
s of scale 

(5.2) s = [(l/n)2(®, ~ m)T = [(1 /n)Zy\] u \ 

Here s is a mean of y x , although it is not a mean of x x . 

Definition. Let y = ^(x), in general multiple valued, be defined m an in¬ 
terval I, finite or infinite, the values of y lying in an interval J. Suppose that for 
each y m J , there is at least one x in I such that \f/(x) = y Let any such x be 
designated by <t>(y). Then 4>{y) will be called the inverse of \p(x). It follows that 
one value of 

(5.3) t[4>(v)] = y- 
Theorem. Let 

(5-4) 2 = f(xi, Xi , • • , x n ), 

in general multiple valued , be defined when each x x is in some interval I, finite or 
infinite. With x m I, set 

(5.5) 4/(x) = f(x, x, ■■■ ,x); 

and suppose that y = has an inverse, x = <j>(y ) defined in J. Let x , = 
<t>(y,) be 'substituted into f to form the function 

(5.6) w = f[4>(yi), , <j>(y n )] = g{y x , y 2 , ■ ■ ■, y„). 

Then w is a mean of ?/,■, defined when y, is in J. It is thus a mean of i/'(:c l ), 
where x, is in I. 

If further, ip(x) is a continuous increasing function of x, then for a given set of 
x,, the values of s and w are identical. The same is true for a given set of n values y,. 
Proof. If each y, - c, a number in J, then 

(5.7) f[<t>(yi), • • ■ , <l>(yn)] = f[<j>(c), ■■■ , <p(c)] = i I[4>(c)]. 

And one value of 4 , [<t>(c)] is c, from the definition of the inverse function 4>{y). 
Moreover, if a number c' is taken in J, then <p(c') is some number in J, which 
we may call c; and the argument above is applicable. Finally, if \p(x) is con¬ 
tinuous and increasing, then a number x, in I is associated with one and only 
one y x in J ; and vice versa. Thus w and z become identical. 

In the foregoing, we started with f which is not a mean of its arguments x,, 
and obtained g which is a mean of y ,. Something like the reverse of this is 
possible. The last member of (5.2) is a mean of yi . It was obtained by treat- 
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mg « te a constant, With respect to x l . If, however, m is an estimate for 
location and is taken as (l/n)2a;,, and this is substituted into (5.2) then 

(5.8) s = {[(n - l)Jn]1x\ - (2/n)2x,x,} in , i < j. 

This s is now not a mean of x, ; for if x equal any constant c, then s = 0. 
Furthermore, there exists no single valued continuous increasing function x = 
such that if = <j>(y y ) is substituted into (5.8), s will be a mean of the 
y,. Thus the elimination of m from (5.2) interferes with the status of s as a 
mean of the xi . 


6. Indeterminate Forms that arise in testing for Means. Sometimes a func¬ 
tion / is substantially continuous. But the investigation leading to the func¬ 
tion fails to assign to the function a value for certain values of the argument x, 
or arguments, xi , x 2 , • ■ ■ , x„ . However, values are often assignable which 
will make the function continuous. This is the usual occurrence when, in curve 
fitting, parameters are estimated. In general, the measurements are assumed 
to be not all alike. However, when a general function such as 2 x x /n for loca¬ 
tion is obtained, we do not hesitate to assign to this function the value c when 
each x,' — c, to make the function continuous. 

As another illustration of “indeterminate forms,” consider the Jackson [30] 
median, M, of four numbers x t £ x 2 < x 3 ^ a*, viz., 

(6.1) M = (X 4 X 3 — XiX!)/(x 4 + X 3 — x 2 — xi). 


A direct substitution of x = c, renders M indeterminate. But if .t, —> c, 
indeed, if merely x 2 —> c, and x 3 —> c, so also does M. 

In a recent paper, R. Cisbani [33] generalizes means suggested by Dunkel 
[32] and L. Galvani [34] by setting up 

T -Jh l~ llz 

(6.2) y,(x) = In 1 2 (a’ + ih )~ xh J , j ^ 0, x ^ 0; 


and letting n —> 0 °. There results an integral with the value 


(6.3) 



b‘ +> - a 1+1 T' x 

(x/j + 1)(6' - a')J ’ 


for the case, x ^ j. This mean set up as a mean of an infinite number of variates 
turns out to be also a mean of the two numbers a and b, —which for t> = a be¬ 
comes indeterminate. But as b approaches a, so also does y,(x) approach a. 
This is also true for the special cases x = — j, etc. 

In testing to see if a function m of x, is a mean of these numbers, a difficulty 
sometimes arises, because a substitution of x t = c and m = c into the equation 
which implicitly defines m will put zeros into denominators. An aid in such 
testing will now be formulated as a theorem, although the ideas involved are 
not essentially new. 
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Theorem. Let f(x) he a continuous increasing function of r, defined for each 
real x, Let 

(6.4) m = 0. 

Given n real distinct numbers 


(6.5) Xx < xi < ■■■ < x n -i < x n , 

n positive numbers, h % , and a real number C. 

Set 


(6 6 ) 


F{x) = 


h 


+ + 


fc,i 


fix 1 - a;) ' f(x n — x) 

Then F(x) = 0 has n — 1 real roots m,, such that 


- C. 


(6.7) Xx < mi < xi < wij < • • • < m n _i < x n ; 
also, a root less than xi, provided 

(6.8) 2*,//(+«°) < C;, 
or a root greater than x n , provided 

(6.9) W/(-°o) > C. 

Proof. Since fix) is a continuous increasing function of x, so also is 
k,/f{% , — s), except for the single value, x = x< . So also, then, is F(x), except . 
when x = or or or • ■ or %„. But 


(6.10) F(x t + 0) = - *; F(x, + i ~ 0) = + 

Hence, between x, and x^i, there exists a root , of F{x) = 0. 

Moreover, since 

(611) F(-co) = [2&,//(+«)] - C)F{xx - 0) = +oo; 

it follows that there is a root less than xx, provided (0.8) is satisfied. Likewise, 
there is a root greater than i„ if (6 9) is satisfied. 

The use of this theorem in testing for means is simple. Keeping the x, dis¬ 
tinct, the equation F(x) ~ 0 determines (n — 1) numbers, m,, such that if 
x, c, so also do these m, —> c. Employing continuity to define m, when each 
A = c, we may say that each m , is a mean of x x ; j — 1, 2, • ■ (n — 1); i = 
1,2, n, when the conditions of this theorem are satisfied. If F(x) — 0 has 
still another root, m, this to will not in general be a mean of a:,. 


7. Summational Means arising in the Estimation of Parameters of Frequency 
Distributions. In curve fitting, the estimation of parameters leads in general 
to summational means. If the method of moments is used, the first step is to 
find the moments by summation. I have already considered estimates for 
location and scale by this method [7], and by the R, A. Fisher method of maxi- 
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mum likelihood [4]. A further study of the results of the likelihood method will 
now be made. 

By this method, products which first appear are reduced to sums by log¬ 
arithms, and the means found are, in general, summational. Some idea of the 
forms of these means can be obtained by examining a rather general form of 
frequency function which includes the Pearson Type I, and involves parameters 
with estimates p >' 0 and q > 0, in addition to the location m and scale a. 
Let the observations be x : , , x n ; let 


(7.1) 

(7.2) 


<, = (x.{ — m)/a ; 0 g t g 1; a > 0; 


„ = 1 r( ? + g) 

V a T(p)T(q) 


r\ i - ty - 1 


The likelihood L is obtained by multiplying together the n factors obtained 
by substituting t = U , , > • • , t n . 

Then 

log L = — n log a -f n log T(p + q) — n log-P(p) - n log r(#) 

+ (P - 1) 2 log <• + (?- 1) £ log (1 - 0- 
1 1 

From dL/dm — 0, there is obtained 


(7 4) P2 




m 




x, — m — a 


= 0; P = p - 1, Q = q-l. 


Suppose P 0 and Q ^ 0; and as a first case, suppose P + Q ^ 0. If each 
Xi is replaced by x, the above equation leads to m = x — ( Pa)/(P + Q). 

Then m is a summational mean of 

(7.5) (Pa)/(P + Q) i = 1, 2, • • • , n; 

as seen by applying the Theorem in Section 5. 

Likewise, a is a summational mean of 

(7.6) x! = (Xi - m)(P + Q)/P. 


If P ^ 0, Q ^ 0; but P + Q = 0, then (7,4) becomes 

(7.7) 2 --- = 2 -i— . 

x, — m — a Xi — m 


Now set y, = x x — m, C = 21/j/, ; and write (7.7) as 


(7.8) F(a ) = 2 —-(7 = 0. 

Vx - a 

This has the form given in (6.6) with x replaced by a, hi = l,/(a) = a. If then 
Vi < Vi < ■ • < y n , there exist (n — 1) solutions a, of F( a) = 0 between jq 



174 


EDWAlifi L. DODD 


and y n . And thus keeping the y, distinct, if y { -» c, so also do the a, -» d 
These a, are then means of y x , and thus, means of x % - m. 

In the more general case where P + Q ?£ 0, it is seen also that Q is a summa¬ 
tional mean of 


(7.9) 


r— - >i- 

L®i — m J 


From dL/da = 0, quite analogous results are obtained The special case 
now however, is given byP + Q + l- O- p + g-1. And, with the 
continuity interpretation, a is a mean of x, — m; and moreover, m is a mean of 

1 fl, 

Using now the digamma function 


(7.10) 

set 


F (u) = ~ log r(tt), 


( ? ' n ) D(jt) = f( p + q) - F ( P ). 

The condition dL/dp = 0, then leads to 

(7 ’ 12 ) d (p) = (l/’«)2(—log 0, 


0 < U g 1. 


Now, with q > 0, £>(«) 0, Z>( 1 -f- 0) — oo • and D(p ) is a continuous de¬ 

creasing function of p, when p > - 1. Then, since - log U > 0 there is a 
unique p > - 1 to satisfy (6.12). ’ a 

To be useful here, p should be > 0. But, at all events, the p thus found is 
a mean of D (-log <,), where IT 1 is inverse to D. 

the T rmi“ “ 0n ° 10) ° PP '" S aIS ° “ CS * im °‘ ine ‘ h ° to 


(7.13) 


y = 


e l p , l = (x - m)/a, p > — i. 


a V (p + 1) 

_ o, it is found that m is the arithmetic mean of x t — ae f(v+l> • 

r -> E (, - „>/», - rite the ££ ir: '-0 it 

1 “At JV* a “■* of *<-!«;<• IS the t«»« mo,,,’ of 

0, thertstwt ° n ' C mea " <’• ~ m,/ “' * 


(7.14) 


(l/n)2a:, = m + a(p -f- 1 ) ; 


sr “ *• <»* t m r e ct rti 



THE SUBSTITUTIVE MEAN 


175 


8. Generalizations. The extension of results from the discrete or discontinu¬ 
ous case where a mean m depends upon only a finite number of elements to the 
continuous case is fairly immediate, with integration taking the place of summa¬ 
tion, and a distribution or frequency function taking the place of discrete weights, 
c,. Stieltjes and Lebesque integrals may be used as well as Riemannian Such 
a generalization of the Chisint mean was given by de Fmetti [2]. 

The summational mean, which I have defined as involving possibly several 
summations, may be generalized likewise. 

In terms of set functions, sometimes called functionelles, I gave [35] the fol¬ 
lowing general definition of a mean with a point set H in mind as a distribution 
function 

Definition Let E and Ii be sets of numbers. Such a number l may be a real 
number or a vector number t = (ty, t 2 , • • , <*,). 

Let Ei be the result of replacing each number of E by a single number t. 

Then the mean m of numbers in E, relative to the set H, and to a function f, is 
given by m — f(E, If)', provided that the function f has been so constructed that 
for each t in E, f(Ei , H) = t, or at least one value of this f is t. It is to be under¬ 
stood above that when E ts changed to Et , the set H remains unaltered. 

This retains the chief feature of fit, t, ■ , t) = t in explicit form or of fit, 

t, ■ • , t) = f(ti, ti , • • , t„) in implicit form, where t is a mean of k ,U, • , t„ . 

I used [36] a somewhat less general definition to discuss regression coefficients. 
All such means may well be called substitutive oi representative. 
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THE PRODUCT SEMI-INVARIANTS OF THE MEAN AND A 
CENTRAL MOMENT IN SAMPLES 

By Cecil C. Craig 


The method developed by the author for calculating the semi-invariants and 
product semi-invariants of moments in samples from any infinite population 1 
is not immediately applicable to the calculation of product semi-invariants of 
the mean and a central moment in such samples. In the present paper this 
method is adapted for this purpose so that the calculation of these product 
semi-invariants becomes routine As it will be seen, the computing is a little 
heavier than in the case of central moments alone for results of equal weight. 
A table of results up to weight ten for the mean and the second, third and fourth 
central moments is given. The author plans to apply these to a further study 
of the sampling characteristics of the coefficient of variation and Fisher’s t in 
samples from non-normal populations. 

Let a random sample, xi, Xu , , x N of N observations be drawn at random 

from an infinite population characterized by the semi-invariants, Xi, X 2 , Xa, - 
The sample mean is, 

x = X) x>/N, 

t~l 

and the n-th central moment of the sample is 

m n = £ (x, - x) n /N. 

i-i 

Then the product semi-invariants of order kl of x and m„ , Ski(x, m„), are defined 
by the formal identity in the parameters # and «: 

($10# + $01 w) + jrj ($io# + $oiw) (2) 

( 1 ) 

+ 1 ($10# + $010,)® + ^ log E(e mm n, 

in which E denotes the mathematical expectation over the set of all such 
samples and 

($io# + $oio,) (r) = £ Q’Sj.rJi, m^oTh 


1 "An Application of Thiele’s Semi-invariants to the Sampling Problem;’’ Metron, Vol. 
VII, part IV (1928), pp. 3-75. 
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If wo deuoLe E{xt)in) by Mu , wc have by definition the fuither formal identity 
in $ and u. 

E(^ +m ' a ) s l + (Mm# + M m co) + I (M 10 $ + M 0 ico) (2) + • • 

in which (M w & + M 0 i&>) W is to be expanded in the same manner as 
($ 10 $ 4~ iSoiw)^ above. 

Let us write 


and then 

( 2 ) 


8i ~~ X\ Xj 


E(e ia+m " u ) = E(e 


(Si,)>?/(V+(2S'JWV 


). 


(Summations with respect to i and j always run from 1 to N.) Now we define 
a new set of product semi-invariants, A rai , , of the sum 2a;, and the N S,'n, by 
means of 


(Xio$ + 2Xoiw,) + ^,(X 10 $ + 2 Xovai,) (2) + ■ log E(e (lx ' )d+Xi ' u '), 

in which for example, 

/ v" \ <2) 2 

( Xjo$ 4 " 2 ~i Xoi&J, I = X2000 $ 4 " 2 Xnoo $«i 

4 ~ 2 Xiom $«2 4 ~ • • • 4 " X0200 oi\ 4 " X0020W2 4 ” A0002013. 

We may set 

f 

1 


5, = 2 a » x i with 

j-i 


1 ^3 


G-ii 


N - 1 
N 


Then 


E(e (Xx ' )0+1SlUi ) = E(e^ t< - 0+ r a ' ,u ' ) ) = E{e aiX[ )-Eie”-* 2 ) ■■■ E(e a » x »), 
in which 


a. — $ 4 - 23 a»j «;• 
s 

It follows then that 


(Xw$ 4~ 2Xo,«,) 4- — (Xio$ 4" 2Xoiw,) 


( 2 ) 


+ (A 10$ + 2X 0 , Wi ) (3) 4 - ■ • • = Xi2a, 4 " As ^ + Aa ~ + • • ■ , 


2«" 


2! 
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from which 

(Xio$ SXo,w,) a+!) = Xa; + j 2 ($ "h £ o,jO),) l+i 

a ; 

From this 

Xroo . o = X*o = NX*., 

Xuo ..o — Xaoio .0 = ■ ■ ■ =0, 

and generally, 2 

(3) Xj^ij.. = ~[S( —1) ! l, (iV — 1)**] (k + k + ■ ■ • + In = 0. 

This is the first result to be used in calculating values of Ski’s. Note that the 
value of is independent of the order in which a given set of l,'s occur. 

Calculation of particular X«,i 2 ,.i ;V 's in terms of N and the semi-invariants of 
the sampled population is both simple and rapid as one may see from a pair 
of examples: 

Xj 2 = XjQ 2 = X2002 = • • • 

(suppressing superfluous zeros in the subscripts) 

= ^ 2 [(N-1) 2 + (N-1)] = ^- 1 X4. 

Then, too, 

A' - 1 

X12 = — jy — Xfr-r2. 


For a second example: 

X, +3 = ^_ 7 [ — (N - l) 4 + (N - l) s — (N — 2)] 


(AT - 2) (TV 2 - 31V + 3) v 

^ k+7 ' 

Now the semi-invariants, Ski, can be expressed directly in terms of the 
product moments, Vki^ i N of the sum Sfc, and the NS’s. These product mo¬ 
ments are given by the appropriate moment generating function: 

= x + + 2„ 0i& ,,) + I („ 1O 0 + 2 ro,-co,) (2 ' + ■ ■ • • 


2 As written this result is valid if at least one of the l,’a is zero which is always the 
case if N, the size of the sample, is greater than l, (Of the author’s paper cited above, 
p. 17.) 
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Then it is seen that, 

+£I5,)«) _ j _j_ _p (2>o I ni)«]’ + — [>iq$ -j- (2>o,?it)w] t2) -f- • ■ , 

in which 

[>io$ + (2 »'o,th)w] (2> 

= I'M!? 2 4" 2(vin + VlOn + >100* + ■ • • )$&) -J- (^o,2n + >00,2 n + >000,2n + ■ ■ )&) 2 
etc. and by comparison with (1) and (2), we have 

($ 10 $ + $oi«) 4" ($ 10 $ + Smu ) (2) + • - • 

S lo g |l + ^ I>10$ + (S>0,m)w] + [>10$ + (Sj'o.nJo)]' 21 + . . 

From this 

($io$ + s 0iu y k+l) 

(4) 5 i v (-l^fr-W + Qifa* + (aOttHfrutf 4- (^^T)*-■ • 

N k + ! (l!) r (2I)' ... ?4sl • • • 

in which 

r + s + f + • • • = p, 

the summation extending over all partitions (l r 2’3 f ■ • • ) of k + l. This, of 
course, is only the usual formula for semi-invariants in terms of moments appro¬ 
priately modified. In particular, 

(Sl 0 $ + $0iw) (2) = ^ {[>10$ + (S>0,n.)w] t2 ’ - [>10$ 4- (S> 0 ,m)ai] 2 } 

If we write 


[>io$ ■+■ (S>o, n »)w] = W 

( 5 ) ($m$ + $oi«) w = Ip (F (,) - 3 W w W + 2F 3 ) 

($10$ + $oi«) (4) = ~ [Jf (4) - 4F <8> F - 3(F (2) ) 2 + 12T7 (2) F 2 - 6F 4 ]. 

Now the v kh j,,,, i n ’b can be replaced by their values in terms of the \ h h ■■ i N % 
the details of which will be explained below, and it will be evident that any 
v kiih'-.iif is unaltered by a permutation of the l t ’s in its subscript. Taking 
account of this, the formulae (5) may be written in the expanded forms: 

1 
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Sli(x, m n ) = [" 2 n VwVOn — 2vi„via + 2Pl0('0n] 

™„.) = ^2 K 2 " + (iV — 1 )pi m — VioVo,2n — {N — l)pi D V 0 nn 

— 2Nvi n VQ n + 2fVVioJ'07i] 

Bui, with no loss m generality, the origin may be taken at the population mean 
so that Xi = 0. In this case it will be found that «no = 0 and these formulae 
become: 

iSnfe m„) = vm/N 

Sll (.%) [p2u V20P0n] 


Sn(x, m n ) = ^2 [m, 2 n + (N - l>i nn - 2Nvi n v 0 n] 

b':u (t, 171^ — f^30^071 3pi n P2o] 

<6) . 1 

^ 22(^1 ^7») [^2, 2n + (N 1)^2 nil 2A^^2n^0n ^20^0.2u 

~ (N ~ 1)^20^07171 “ 2>Nvx n + 2Nv2QHn] 

SiaCt, m n ) = ^ [n,3n + 3(IV — l)vi,u,n + {N — 1)(JV — 2)ri» nn 

— SNviflnV 07 . — 3 N{N — l)v-i. nn van ~ ?>Nvi n V 0|2 n 

— 3 N{N — l)ri„ro7m + 61V 2 vi n i’L]. 

These formulae are the second result used in the actual calculation of 
Su(x, m n )’ s. One begins with them, putting in the particular value of n for 
the central moment m question. If for instance we wish to compute the product 
semi-invariants of the mean and variance in samples of N, we begin with the 
set of formulae: 


( 7 ) 


Su(x, m) = m/N 

$2l(:C, m-i) = [V22 — »20^02] 

lr 

Sl 2 (x, m 2 ) = ^y 2 [i'll {N — l)ri 22 ~ 2 Nvi 2 Voi], 


etc 

The second step is to replace the product moments vkhh which appear by 
their values m terms of the corresponding product semi-invariants. This process 
can perhaps be best explained by some examples. 
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Consider the complete calculation of S u (x, m 5 ) From the expression for the 
fifth central moment m terms of semi-invariants. 


vs = Xfi + IOX 3 X 2 , 

we can write the corresponding expression for product moments in terms of 
product semi-invariants 


( 8 ) 


(2kA) ( 6) = (2X-A)® + 10(2XA)' 8) (2X,d.) (2> . 


Then we get vh by comparing coefficients of and vm by comparing coeffi- 

n q2 q2 

cients of - in this identity For an index as low as 5, these coefficients 

are readily picked out by inspection; for larger indices the use of Hammond 
operators reduces this to a mechanical routine. 3 In this case we have 


AA(14) = (12) (02) + (03)(11). 


To the terms 011 the right the appropriate binomial coefficients must be applied 
giving 

3(12) (02) + 2(03) (11). 

5! 

The total of these coefficients is 5 = , a necessary check. Then multi¬ 

plying these coefficients by 10/5, we have 

6X12X02 T" 4X03X11 

for the required coefficients in the second term in (8). Thus 

vh — Xj 4 4" (6X12X02 fi - 4X03X11). 


The two terms in parentheses arise from the same term in (8) and would both 
give rise to terms in XaX 2 m the final result if Xu were not identically zero from (3). 
In practice all terms in which X&i is a factor are crossed out as they appear. 
Next 


DsD 2 { 122 ) = 2 ( 12 ) ( 02 ) + ( 111 )( 011 ) + 2 ( 021 )( 11 ). 

(X 002 = X 02 ; X 012 = X 021 .) With the binomial, or multinomial coefficients attached, 
the right member is rewritten 

6(12)(02) 4- 12(111)(011) 4- 12(021)(11). 


1 Cf. the author, loc. cit., p, 24.' 
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O ! r 

The total of these coefficients is 30 = ^y^Tfi ■ Then multiplying each coeffi¬ 
cient by 10/30, we have 

Vl22 = Xl 22 (2XuXo2 "f" 4XlllXoU -f- 4Xoi2Xll). 

Going on with the calculation of Su(x, mi): 

*>12 = X 12 , ^02 = X 02 , 


and then we have: 


Su(x, mi) = [{Xu + (N — l)Xi 22 } 


+ { 6 X 12 X 02 + (N — 13 ( 2 X 12 X 02 + 4XinXoii) — 27 ^X 12 X 02 }]. 

The first set of terms within braces gives rise to terms in X 6 ; the second to terms 
in X3X2 . Next 


Xl4 


(N - 1)(N 2 — 3N + 3), 

-srfj- Ac Am 


Xa 

N 


X122 = 2iV ^- 3 X 6 
Jy 6 

, (N — 1){N — 2) 

Aq3 = - Jp, —- A3 


X021 = ^ 


X 02 


N - 1 
N 


X 2 


Xon — — 


X2 

N‘ 


This table of values will be of frequent use in further calculations of SuR 
Giving the values of both Xm and Xon here, was unnecessary duplication. 
Now only the final reduction is to be carried out. We obtain 


Si 2 (x, mi) = ^. 4 ■ [(N — 1 )Xb + 4 iVXs X2] . 


This result of order 3 and of weight 5 follows a quite mechanical procedure 
and is quite brief. The length of the algebraic computations required grows 
rapidly as the weight is increased but for weights no greater than 10 undue labor 
is not required. For greater weights only time and patience is required to get 
results if they are needed. It is to be noted that by this method one may 
calculate individual terms in the result without doing any of the work required 
for the remaining terms and that one may readily shorten the work by getting 
results to a desired degree of approximation with respect to powers of l/N. 

There follows a table of the results so far calculated. 
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For n = 2: 
N- 1 , 


-Si, - 

-S 21 = 

5 a = 


JV 

y- 1 

IV 8 

y-1 

AT 4 


X 4 


[(y — T-)Xs 4iVXaX a 


&, - NJ: x, 

S22 = I(y — i)a« + 4 y(x 4 x a + X!)] 

5u = ^- 1 1 (N - l) a Xv + 12tf(y - 1)X 6 X 2 + 4iV(51V - 7)X 4 X 3 + 24iV 2 X a X 2 2 ] 
It is not difficult to see that in general 

i-i/-1 (’T 1 7Hi) Xa.+2 • 

For n = 3: 

q _(y-D(y-2), 

™- jp - 

a _(y-D(y- 2 ). 

OJi — ^4 X6 

Sn = W - ~ ^“- 2) KN -l)m- 2)X 7 + 9y(y - 2)X 5 X 2 
+ 27 N(N - 21X1X2 + ] 8JV 2 X«X|] 

o _ (y- i)(y-2), 

bai -jV 6 Xc 

,S 2 2 = yLlJ-W - V \(N - D(N - 2)X S + 9y(y - 2)X„X 3 

+ 36y(y - 2)x.r,X2 4- 27 n(n - 2)xi 4- isy’x^ 4- StW^SxJ 
[y(y - D a (y - 2) a x I0 

+ 9(y - i)(3y 4 - i2y 3 4- i2y 2 - .w + d)x 8 x 2 

4- 27NUN 4 - 21N 3 4- 36y a - 20N 4- 3)X,X 3 

4- 27N\N - 2) a (7 N - 11)X G X 4 4- 54N\N - 2)(4 N - 7 )X 0 X^ 
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+ 27 N\N - 2) 2 (4 N - 7)X^ + 542V 3 (2V - 2) (232V - 50)X 6 X 3 X 2 
-1- 162 N\N - 2) (5 N - 12)X?X 2 + 542V 3 (292V 2 - 126iV + 140)X 4 X 2 
+ 1082V 4 (52V - 12)X 4 X 2 + 3242V 4 (52V - 12)xixJ]. 

For w = 4: 

Sn = W 2 -32V + 3)X 6 + 62V(2V - l)X.Xj 
Sii = - 1 [(2V 2 -32V + 3)X 6 + 62V(2V - 1 )(X 4 Xi + X?)] 

Sn = - 1)(2V 2 - 3# + 3) 2 X 2 

+ 42V(2V S -3 N + 3) (72V 2 - 182V + 15)X 7 X» 

+ 42V (2V 2 - 32V + 3) (192V 2 - 662V + 63)X 6 X 3 
+ 42V(292V 4 - 1952V J + 5372V 2 - 6392V + 351)X 6 X, 

+ 122V 2 (172V 3 - 712V 2 + 1172V - 69)X S A^ 

+ 242V 2 (352V 3 - 1732V 2 + 3092V - 189)X 4 X 3 X 2 
+ 12N\N - 2)\SN - 5)X 3 + 962V 3 (42V 2 ~ 92V + 6)X 3 X 2 ] 

S 3 i = [(2V 2 -3 N + 3)X 7 + 62V(2V - 1)X B X 2 + 182V(2V - l)X 4 Xj 

[(2V - 1)(2V 2 - 32V + 3) 2 Xio 

+ 42V(2V 2 - 32V + 3)(72V 2 - 182V + 15)X 8 X 2 
+ 82V(2V 2 - 32V + 3) (132V 2 - 422V + 39)X 7 X 3 
+ 122V(162V 4 - 1062V 3 + 2852V 2 - 3602V + 1SO)X 0 X 4 
+ 122V 2 (172V 3 - 712V 2 + 1172V - 69)X 6 X^ 

+ 42V(292V 4 - 1952V 3 + 5372V 2 - 6932V + 351)X 2 
+ 482V 2 (262V 3 - 1252V 2 + 2132V - 129)X 6 A 3 A 2 
+ 242V 2 (352V 3 - 1732V 2 + 3092V - 189)X?Xi 
+ 242V 2 (622V 3 - 3262V 2 + 5972V - 369)X 4 X 2 
+ 962V 3 (42V 2 - 92V + 6)X 4 X 2 + 2882V 3 (42V 2 - 92V + 6)X 2 X 2 ] 


The University of Michigan, 
Ann Arbor, Mich. 



ON THE NON-EXISTENCE OF TESTS OF “STUDENT’S” HYPOTHESIS 
HAVING POWER FUNCTIONS INDEPENDENT OF a 

By George B. Dantzig 

1. Introduction. Consider a system of n random variables *i 
where each is known to be normally distributed about the same but unknown 
mean, £, and with the same, but also unknown standard deviation <t. The 
assumption, Ha , that £ has some specified value, £o, e.g. £q = 0, while nothing 
is assumed about a, is known as the “Student” Hypothesis. Two aspects of 
the hypothesis Ha have been already studied extensively. If the alternatives 
with respect to which it is desired to test Ha assume specifically that £ > £ 0 , 
(or £ < 0), then we have the so-called asymmetric case of “Student’s Hypothe¬ 
sis” and it is known, [1], that there exists a uniformly most powerful test of Ha . 
This consists in the rule, originally suggested by “Student,” of rejecting Ha 
whenever 

( 1 ) ( = 

where x and S denote the mean and the standard deviation of the observed 
x.’a and t a is taken, for example, from Fisher’s Tables [2] with his P = 2a. 
In other words t a is such that 

(2) P\t > t a \B Q } = «, 

where a is the chosen level of significance. In accordance with the definition 
of the uniformly most powerful test, whenever any other rule, R, offered to test 
the same hypothesis Ha has the same probability a of Ho being rejected when 
it is true, the power of this alternative test cannot exceed that of “Student’s” 
Test. In other words, if it happens that the true value of £ is not equal to £ 0 
but is greater, then the probability of this circumstance being detected by 
“Student’s” test is at least equal to that corresponding to the rule R 
If the set of alternative hypotheses is not limited to those specifying the 
value of £ either greater or smaller than £o, but includes both those categories, 
then it is known, [1], that there is no uniformly most powerful test of the hy¬ 
pothesis, Ha . However in this case there exists a slightly different test, also 
based on "Student’s” criterion t, possessing the remarkable property of being 
unbiased of type B { , [3], The test, in common use for a long time, consists in 
rejecting H 0 when 


\t\ > l a , 
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with l a being taken again from Fisher’s tables, this time conesponding to Ins 
P = a, where a. is the chosen level of significance 

In ordei to describe the optimum property of this test wc must use the con¬ 
cept of the power function of a test, [3], Denote by /3(£, a) the probability of 
the hypothesis Ho being rejected when £ and tr are the true mean and the tme 
standard eiror of the observable z.’s The function /3(£, a) is just what is 
called the power function of the test, If wc substitute £ = £ 0 , then we shall 
have /3(£o, <r) = a irrespective of the value of <r. Now the optimum property 
of “Student’s” test mentioned above consists in that (1) its power function 
has a minimum at £ = £ 0 and this is true whatever be the value of <r, (2) what¬ 
ever be any other test of the same hypothesis which has the same level of sig¬ 
nificance a and has property (1), its power function /3'(£, <r) cannot exceed that 
of “Student’s” test 

These two properties, demonstiatmg the excellence of the criterion suggested 
by “Student,” fully justify the gencial confidence m the test as desciibed above, 
or in its extended form where it is applied to two oi moie samples However, 
it is known that “Student’s” test in both its foims, t > t a , and | t \ > t a , has 
one very undesirable property which causes great difficulties in various problems 
of rational planning of cxpci iments 

One of the most important questions to have m mind when planning an 
experiment is' What is the probability that the experiment and the subsequent 
statistical test will detect a difference or effect when it actually exists? If we 
perform an experiment and then apply some statistical analysis to test 
"Student’s” hypothesis that £ = £ 0 , we do hope that, if the actual value of £ 
is different from £o, the test will discover this circumstance. But apart from 
mere hope, it is desirable to take precautions so that when the difference, 
£ — £o = A, has some appreciable value, the chance of the hypothesis Ho being 
rejected will be reasonably large. This may be done by calculating the value 
of the power function /3(£, c) corresponding to the value £ = £o + A And 
here we come to the unfortunate property of “Student’s” test 

Although the form of the power function of “Student’s” test is known and 
tabled [4], [5], [6], [7], there are occasionally considerable difficulties in applying 
these tables, because it appears that the values n and A are not all its arguments, 
for it also depends on <x. Consequently in order to have an idea of the proba¬ 
bility that the test will detect the falsehood of the hypothesis 11 0 that £ = £o 
when actually £ = £ u + A we need not only the knowledge of n but also a 
likely value of a The latter is known accurately only in exceptional cases and 
then in those cases one would apply a test which is different from “Student’s” 
test. Usually we have only a vague notion of the magnitude of a and accord¬ 
ingly the tables of /3(£, <r) may be used to obtain a rough idea as to whether 
the arrangement of the experiment planned is satisfactory or not Frequently 
we have no idea of what may be the values of u 

To Dr. P, L. Hsu is due the idea of looking for tests, the power of which is 
independent of the parameters unspecified by the hypothesis tested, In an, 
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unpublished paper, he proved among other things that the X test of the general 
linear hypothesis is the most powerful of all those, the power function of which 
depends on the same argument as that of the X test and not on other parameters. 
The above circumstances suggest the following problem - to see whether it is 
possible to devise a test of “Student’s" hypothesis such that its power function 
would be independent of a If such a test could be devised and proved to be 
reasonably powerful then the tables of its power function could be used for the 
purpose of planning experiments. 

The purpose of the picsent paper is to show that no such test exists and, 
consequently, this negative result implies in still another way that it is im¬ 
possible to improve on the test originally suggested by “Student.” 

2. Statement of the Problem. The problem of finding a test whose power 
function is independent of a is equivalent to finding a critical region w such 
that the value of the power function 

(4) d(£, <r) = P{E « w | £, a-) 

for any fixed £ is independent of the value of <r, where E denotes the sample 
point (xi , Xi , • ■ • x n ). We shall show specifically that if this is the case, then 
the power function is also independent of £, so that the test will reject the hy¬ 
pothesis tested with the same frequency independently of whether it be correct 
or wrong. 

3. Theorem. If there exists a region w such that, whatever he the value of a, 

(5) (vfc) / ' ‘' / e_ ^ A dXldx2 1 * • ** 25 “ 

V) 

(6) (vb) / ' ■ / e ~*°* , " 1 U '~ h)t dXl dx i‘" dx " = 

w 

where £ 0 £i, a, /3 are constants, then 

(7) a = p. 

A legion w is called similar [1] to the whole sample space, IF, of size a, with 
respect to a set of elementary probability laws p(E \ 0) given in terms of a 
parameter 6, if P\E t w | 6} = a, whatever be the value of 6. Essentially, 
then, the region, u>, above is a similar region with respect to two different sets 
of elementary laws each being given parametrically in terms of the parameter tr. 

n 

Denote by w r the portion of the surface of the hypersphere, 23 ( x > ~ fo) 2 = r\ 

which is common to w, and let the total surface be denoted by IF r . Neyman 
and Pearson have shown [1], that a necessary and sufficient condition that w 
be a similar region, in the above case, is that, whatever be r, the probability 
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that the sample point E will fall on the subsurface w r , when it is known that 
the sample point lies on the surface W r is a, i.e 

(8) P{Eew r \ (EeW r )(t = ft)} = a 

for all r. 

In a similar manner let w p denote the portion of the surface of the hyper- 

n 

sphere £ (x t — £i) 2 = p 2 common to w, and let the total surface be denoted 

i=l 

by W p . Since w is similar to the set of probability laws indicated in (6), we 
have also 


(9) 


P{E eWp\(EeWp)(!; = &)} = 0 


for all p. 

Since on the surface W r , the elementary probability law, 


( 10 ) 



3 2*2,2 t (l .-«o) s 



r 2 

e~2^, 


is constant, we see that an equivalent statement of (8) is that the hyper-area of 
w T is a constant proportion, a, of the total hyper-area W r ■ Similarly, from (9), 
we have that the hyper-area of w p is a constant proportion, /3, of the area of the 
hypersurface W fl , whatever be the values of r and p. 

Consider the transformation which expresses Xi, xi, • • ■ x„ in terms of gen¬ 
eralized polar coordinates with pole at the point (to , to , • • • , to), i.e. 


(ID 


xi — to = r cos 02 cos 03 • ■ • cos 0„_ 2 cos 0„_i cos 0 n 

X 2 —to = r cos 0 2 cos 0 3 • cos 0„_ 2 cos 0„_i sin 0„ 

Xa — to = r cos 02 cos 03 • ■ cos 0„_2 sin 9 n -i 


x n -i — to = r cos 0 2 sin 0 3 


x n — to = r sin 02 

Let A be the Jacobian of the transformation: 


( 12 ) 


A | = r" 


n 

n COS 0n+ 2 —» 


l T(e,). 


Consider also a transformation which expresses (xi, x 2 , • ■ x n ) in terms of polar 
coordinates, the point (t i, ti, ■ • • > ti) being pole. It may be obtained by 
replacing in (11), to by ti , r by p, and 0, by 0; . The Jacobian of this trans¬ 
formation is given by | A | = p n ~ l T(9,). 

We are now able to express the hyper-area of W T : 

( 13 ) fj | A | ddadOa ■ ■ ■ d9 n = r n_1 J f T(0 .) <Ma • • ■ d9 n = Kr n ~\ 

w r W r 
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where the integral K > 0 is a constant independent of r. Similarly the hyper¬ 
area of W P is Kp n ~\ where K is the same as in (13). According to ( 8 ) and 
(9) we have, now 


(14) 


// 


A j dd 2 d &3 


d9 n = a-K-r' 


n—1 


( 15 ) J J \A\d8% ddt * • • dd n = t3*K-p n 1 

"p 

Let us consider the distances between the three points; (xi , x 2 > ■ - ■ , a;„), 
(£o, h , ■ • , io), and (^i, £i, • - ■ , £i) The distances of the first point to the 
second point and to the third point we have already denoted by r and p. Let 
the distance between last two be L, then, since the sum of two sides is at least 
equal to the third side of a triangle, we have 

(16) r^p + L, p £ r -\- L, where L = y/N | & — £i |. 

Let ip{t) 5 0 be an arbitrary monotonie nonincreasing function of t, such that 
the product f n ~V(0 is integrable from 0 to -f *>. Since <p{t) is a decreasing 
function it follows from (16) that 

(17) <p(r) ^ <p(p 4- L ) and p(p) ^ <p{r + L). 

Consider the integral I: 

( 18 ) 7 - ff dxidxi ■■■ dx n . 


We shall express it in terms of the variables r, 9 t , • • ■ , 8„ and also in terms of 
P, 02 , ■ • ■ 0„ and compare the results. Thus 


(19) 


7 = J J | A 1 <p(r) drdd 2 • • - dd n 

\D 

= jf <p(r ) dr J J | A | dfa ■ ■ • d8 n 

«r 

= a-K- f r n ~ 1 <p(r)dr. 

Jo 


Also we have by (16) 

I — J f IS | <p(r ) dp dh • • • dd n 
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(20) • ^ If IA | <p(p + L ) dp d(k • • • dd n 

w 

— J 0 ^ dp J J | A | dh ■ • • dd n 

w, 

and consequently 

(21) I ^ p-K f p n 1 ip(p -}- L) dp. 

Jo 

Since K > 0, we have from (19) and (21) 

(22) a/p ^ jf + L )di/ £ t n -\(t)dt. 

By interchanging p and r in (18), (19), (20), and (21) we have also 

(23) p/a ^ jf + L)di/ jf t n ~ 1 ifi(t) dt. 

Let us set in (22) and (23), <p(t) = e~ pl and <p(t + L) = e~ vL e~ pt where p > 0 
is arbitrary. Then 

(24) a/p § e~ pL and p/a £ e~ vt . 

Since (24) holds for all p > 0, let p approach zero. Then Lina e~ pL = 1, and 
the above inequalities can hold only if 

(25) a = p, Q.E.D. 

It is of interest to note that there do exist regions such that the power func¬ 
tion is independent of both £ and <r. For example, let S„ be the standard 
deviation of the observed values (ii, Xt , • • • , x n ) and let S n ~i be the standard 
deviation of the values (*i, xt , ■ • ■ , z„_i), then the region w given by all 
points (*i, a:j, • • • x n ) which satisfy the inequality (S„-\/ S„) ^ C is such a 
region, i.e. 

(26) im-i/Sn) ^ C\ £, v) 

is constant, whatever be the values of £ and v. Such regions are, however, 
unsuitable for testing “Student’s” hypothesis £ = £ 0 , because they will reject 
this hypothesis when it is wrong and when it is correct with equal frequency. 


The author is indebted to Professor J. Neyman for assistance in preparing 
the present paper. 
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A METHOD FOR RECURRENT COMPUTATION OF ALL THE 
PRINCIPAL MINORS OF A DETERMINANT, AND ITS 
APPLICATION IN CONFLUENCE ANALYSIS 


By Olav Reiers)z(l 

1. Recurrent computation of all the principal minors of a determinant. 

The formulae which I develop in this paper have been worked out for use in 
statistical confluence analysis, By means of recurrent computation they shorten 
considerably the amount of work required to compute all principal minors of a 
square matrix. Originally I elaborated this method as a simplification of one 
given by Frisch (not published). 

Subsequently I found that the method could more easily be deduced from the 
pivotal method. This method has been described, for example, by Whittaker 
and Robinson [5] and by Aitken [1]. 

Let ua consider a square n-rowed matrix 


flu 

Ol2 

* ■ ' Oln 

On 

022 

• • ‘ 02n 

On l 

0>1\1 

• • * O/nn 


Let the adjoint of this matrix be || p<, || and let us denote its determinant 
value by D u . , n . 

Then we have the following identity 


( 2 ) 


Pn-1,71-1 Pn- l,n 
Prt.n—1 Pn,*l 




As Aitken points out, the pivotal method is based upon this identity. 

Next consider the following matrix which is formed from the matrix (1) by 
striking out the nth row and the (n - l)th column: 



Oil 

• • • Oi,n—2 

Ol.n 

(3) 

On—2,1 

• • ■ On- 2,n—3 

On-2,n 


On—1,1 

■ • • On—1,fl-I 

dm— l,n 
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Let us denote its adjoint by || q,, j|, its determinant value by An. n = . 

The determinant 


On 

• i • Oi, n _2 

Ol,n-l 


• • • & n — 2,n—2 

Ctn—2,n —1 


• * • 2 

Qin.n—1 


we shall denote by Bn.. „ . 

The identity (2) can now be written 


(2') 


Dii.. n = 


Bn.. ,n— 2,n Bn. • *n—2,n— 1 4l2...nHl2 n 

Da--. 


If we apply the identity (2) to the matrix (3) we get 


§fl—2,n—2 

Qn~ 1,»—2 


Qn— 2,n—1 
1 


— An 


nDn 


• *n-3 j 


which may also be written 


(4) 


112 ~ 


n-3,n-l,n-Dl2 n -2 “ 4-12. n-S,n-2,n f?12 n-1 


D 


12 ’n—3 


To simplify the notation we will not write the affixes present, but write the 
affixes not present in inverted parentheses. Then our formulae (2') and (4) 
can be written 

n _ D)„-uD)„( — AB 
D) n—!.»( 


A _ A)n-2(D)n-l,n( ~ 4)—!(#),, ( 

D) n ~ 2,n— l,n( 


In an analogous way we get 


jg _ i?)n-2(P)n-l.n( ~ j?)„-l(4),,( 

D) n _2, n _l p n( 

We may apply these formulae to an arbitrary principal minor B Vin ,. Vk . 
Let us now denote D nV} ... n by D and denote the absence of one or more of the 
numbers Vi , i>j, • • ■ Vk by placing them into inverted parentheses. We then 
have the formulae: 


(5a) 

a _ A) Vi ~ t iD) Vt _ vVll ( A) Vk _ 1 (B) Vt ( 

**■ . ■ ■■ - j 

(5b) 

p _ ~ B)vk-i(A) Vll i 

D n ’ 

(5c) 

p _ — AB 
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By means of these formulae we can recurrently compute all principal minors. 
We begin with Z>, = o„ , i = 1, 2 . • n, A t , = a i; , B tJ = a n , where i < j. 
Then we compute the D’s with two affixes, 

Dtj = D t D } A t] B tl , 

and then the quantities A, B, D with three affixes, 

Aijk = AjkDi A,kB tJ 

B t jk — B ]k D, — B,kA t j 

n . _ B,icD t} — A l]k B,jk _ . , 

u '* -^-, i < J < k. 

Then we compute the quantities A, B, D with four affixes, and so on. 

If we carry through the computations without dropping any figures we have 
as a control that all divisions will be exact without remainder. If we are 
dropping figures we can control the result by computing the determinant 
Z>i 2 . . n in another way. If we wish to control the computation before it is com¬ 
pleted, we may use our recurrence formulae on the matrix which we get from 
the original matrix when the rows and the columns are subjected to the same 
permutation. For example we can reverse the order of the rows and columns. 
Then we can control the (k — 1) rowed minors before computing the ft-rowed 
minors. 

If all the D’s are different from zero, we may reduce the necessary number of 
multiplications and divisions in the following way. We introduce the following 
notations; 


d = 

D 


a = 

A 

b- B 




_ b_ 


c == 

d)«*( 



Substituting in (5), we get the following system of recurrence formulae; 


(6a) 

a — o.)v k - j( "b ®)»*-i(C)m( 

(6b) 

b = b ) Vk _,( -f- fl)p 

(6c) 

_ b 


d)»i( 

(6d) 

d = d) „*_ l{ + ac 

(6e) 

D = D) Vk {d. 
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An affix Vh on a letter indicates the deletion of the last row and column in the 
determinants making up the definition of that letter, even though those deter¬ 
minants are of lower order than v k . Similarly, an affix v k -i indicates the dele¬ 
tion of the next to the last row and column. 

The a’s with two affixes m these formulae are identical with the elements o„ 
of the matrix (1) where t < j. Further, b„ = ap ,i < j, d t = a tl . Applying 
the recurrence formulae (6) we start with these values. 

If the matrix (1) is symmetric, i.e. if a,, = a.,,, then we get 

B v ju a . ...^ 

and 


h V1VJ ..BJb ' U V ju 2 .,,ufc . 

In this case we can therefore replace B by A in the formulae (5) and replace b 
by a in the formulae (6). 

Numerical example. Let us compute all the scatterances in the constructed 
example given by Frisch, [3, p. 121]. The correlation matrix in this example is. 


1.000000 

-0.121551 

0.656809 

0.752502 

-0.224549 

-0.121551 

1.000000 

0.657698 

-0.732862 

0.212165 

0 656809 

0.657698 

1 000000 

0 0.14385 

-0.040183 

0.752502 

-0.732862 

0 014385 

1.000000 

-0.280223 

-0.224549 

0.212165 

-0.040183 

-0.280223 

1 000000 

Using our recurrence formulae (6) we get the following table: 


a 

c 

d 

D 

12 

-0.121 551 

0.121 551 

0.985 225 

0.985 225 

13 

0 656 809 

-0.656 809 

0.568 602 

0.568 602 

23 

0.657 698 

-0.657 698 

0.567 433 

0.567 433 

14 

0.752 502 

-0.752 502 

0.433 741 

0.433 741 

24 

-0.732 862 

0.732 862 

0.462 913 

0.462 913 

34 

0.014 385 

-0.014 385 

0 999 793 

0 999 793 

15 

-0.224 549 

0.224 549 

0.949 578 

0 949 578 

25 

0.212 165 

-0.212 165 

0 954 986 

0.954 986 

35 

-0.040 183 

0,040 183 

0.998 385 

0.998 385 

45 

-0.280 223 

0.280 223 

0 921 475 

0.921 475 

123 

0.737 534 

-0.748 594 

0,016 489 

0.016 245 

124 

-0.641 395 

0.651 014 

0.016 184 

0.015 945 

134 

-0.479 865 

0 843 938 

0.028 765 

0.016 356 

234 

0.496 387 

-0.874 794 

0 028 677 

0.016 272 

125 

0.184 871 

-0.187 643 

0.914 888 

0 901 371 

135 

0,107 303 

-0.188 714 

0.929 328 

0.528 418 

235 

-0.179 723 

0.316 730 

0.898 062 

0.509 590 

145 

-0.111 249 

0.256 487 

0.921 044 

0,399 405 

245 

-0.124 735 

0.269 457 

0.921 272 

0.426 516 

345 

-0.279 645 

0.279 703 

0,920 167 

0.919 977 
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a 

c 

d 

D 

1234 

0.000 279 

-0.016 6 

0.016 179 

0.000 262 83 

1235 

-0 031 090 

1.885 5 

0 856 268 

0 013 910 

1245 

0 009 105 

-0.562 6 

0.909 766 

0.014 506 

1345 

-0 020 692 

0 719 35 

0.914 443 

0 014 957 

2345 

0.032 486 

-1.132 8 

0.861 262 

0.014 014 

12345 

0.009 621 

-0.594 7 

0.850 546 

0 000 223 55 


2. Computation of the coefficients of the characteristic polynomial of a 
matrix. The characteristic polynomial of the matrix (1) is 

an X ai2 ■ • ■ ai„ 

P(X) = a 21 022 — X • • • a 2n 

0«1 Un2 a n n X 

= Pn ~ P n—1X + P n -i\ 2 — ■ ■ ■ + ( — 1)"X” . 

As is well known, the coefficient P k can be calculated as the sum of all the 
fc-rowed principal minors of the matrix (1). Our method of computing all the 
principal minors of a matrix therefore gives us as a by-product a method of 
computing the coefficients of the characteristic polynomial. Another method 
for the determination of these coefficients has been given by Paul Horst [4] 
Wo may obtain a comparison between the work of computation entailed by 
the two methods by calculating the number of multiplications and divisions 
necessary when vising one or the other method. If our recurrence formulae (6) 
arc used, two multiplications and one division are necessary for computing a 
2-rowed minoi, and 4 multiplications and one division for every minor with 3 
or more lows Consequently the total number of multiplications and divisions 
will be 



= 5-2" - (n + 4?i + 5). 

On using Horst’s method, the number of necessary multiplications and divi¬ 
sions will be found to be 

H n = {\n - l)n 3 + }n 3 + \(n - l)(n + 2) 

Pn — i(n — 1 )(n 3 + 11 + 2) n even, 

H n = \{n — l)(n 3 + ri 2 + n + 2) 


n odd. 
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When n = 2, 3, ■ • 12, S» and #„ acquire the following values: 


n 

8n 

H n 

2 

3 

6 

3 

14 

41 

4. 

43 

105 

5 

110 

314 

6 

255 

560 

7 

558 

1203 

8 

1179 

1827 

9 

2438 

3284 

10 

4975 

4554 

11 

10070 

7325 

12 

20283 

9581 


We see that our method of computing the coefficients of the characteristic 
polynomial involves less calculation when n < 10, while Horst’s method is su¬ 
perior when n A 10. 

If our purpose is to find the characteristic roots of the matrix we can do this 
with less amount of computation without first finding the coefficients of the char¬ 
acteristic polynomial. See Aitken, [2]. 

3. Applications in confluence analysis. The confluence analysis of Frisch is 
set forth in his book- “Statistical Confluence Analysis by Means of Complete 
Regression Systems,” [3]. 

The main method of this book is the “bunch analysis,” which includes the 
computation of the adjoints of the correlation matrices of all sets of variates 
contained in the total set. In section 1, Frisch has described a preliminary 
analysis by means of scatterances. The scatterances are the principal minors 
of the correlation matrix of the total set of variates. If we carry through such 
an analysis, the recurrence formulae of section 1 of this paper will give a rapid 
method for the calculation of all the scatterances. 

Another application of the computation of all the scatterances arises in the 
determination of the correct time lags between variates in a structural equation. 
This problem will be treated in a paper on confluence analysis which will appear 
in the near future. 
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NOTES 

This section is devoted to brief research and expository articles , notes on methodology 
and other short items. 


A CRITERION FOR TESTING THE HYPOTHESIS THAT TWO 
SAMPLES ARE FROM THE SAME POPULATION 

By W. J. Dixon 

1. Introduction. The purpose of this paper is to consider a criterion for 
testing the hypothesis that two samples have been drawn from populations with 
the same distribution function, assuming only that the cumulative distribution 
function common to the two populations is continuous. Let the two samples, 
0„ and O m , be of size n and m respectively. We may assume n < m without 
loss of generality. Suppose the elements Ui , • • •, w„ of O n are arranged in order 
from the smallest to the largest, that is, wi < u 2 < . • < u n . These may be 
represented as points along a line. The elements of 0 m represented as points 
on the same line are then divided into (n + 1) groups by the first sample, O n . 
Let mi be the number of points having a value less than Ui , mi the number 
lying between u, and «, +1 , (i = 1,2, • , n) and m n+ i the number greater than 
«„, (m n+ 1 = m — mi — nh - - m n ). The criterion here proposed is 1 



1 A similar criterion 



for two samples of the same size was investigated (unpublished) by A M Mood. He 
found the mean and variance to be 


E(d>) 


2ft -|- 1 

3n ’ 


<rli 


8 (w - l)(2n + 1) 
46ft 1 


It can be seen that this is the sum of the squares of the differences between the ordinates 
of the two cumulative sample distributions calculated at the jumps of the first sample 
distribution. 
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2. The mean and variance of C\ The only case of continuous cumulative 
distribution functions F(x ) of any interest in statistics is that in which dF(x) = 
fix) dx, where fix) is a probability density function. Let us write: 


Pi 


= r Six) dx, th= [ fix) dx,.--, Pn+1 = / fix) dx, 
J—(C *vi 


where of course p n +i — 1 — Pi ~ pi — ~ Pn • 

Now, the joint distribution law of the pi is 

(2) P(pi , • • ■ , p n ) = n\ dpi ■ • ■ dp n 

and the conditional distribution of the nti given the p,- is 

Tfl ! 

(3) P(wx, ■ • ■ , OT„ +1 \pi,‘--,p n ) = —J - - 

Trill • • • T/iit+l- 

Therefore the joint probability law of the m* and pt is 

n\m\ 


, pT pT 


Pr >+1 ■ 


(4) 


Pim , p) = 


mi\ - - • m„+i! 


tni mg 


Pn+V dpi ■ up,,. 


Let ». w ) - 8 [exp L *. (jqrj ~ j)J i 

*-l OVi Jw uO, «,Jm 

v{6) = / exp [s e *(«Ti - s)] p(w ’ p) ’ 


(5) 

( 6 ) 
and 

(7) 


where 2 m denotes the usual multinomial summation over all integral values of 
ra; > 0 for which 2m, = m and the integration is over the generalized tetra¬ 
hedron defined by pi > 0 and pi + pi + • ■ ■ + p n + 1 < 1. If we perform 
the summation first, we obtain 


s 1 e ' 


2 -x, r _fi ^ii+i 

(8) <p{0) = n\e'~ l " J (pie~ m + -■■-)-p„ +i e m ) m dpi---dp„ 

Differentiating twice with respect to and setting the 6’a equal to zero, we get 

0V| . t 17 I V . /l 2 \ . m - 


90? J 


Iff—0 


' *' / [(iTl) + {l ■ ■ ■ ■ dv ” 


If we now integrate and sum from one to n + 1, we find 

(® w> - y 

m(n + 1)(« + 2) 
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Performing the operations indicated in (6), we obtain ■E[(C 2 )‘ ! ] from which we 
subtract [E(C 2 )f and have as the variance of C 2 , 

2 _ 4 n{m — l)(m + n + l)(m 4- n + 2)^ 
ac2 m 3 (n 4 2 )\n + 3)(n + 4) ; 

3. Significance values of C 2 . If we let C« be defined as the smallest value 
of C‘ for which P(C 2 > C 2 a ) < a then we can compute the value of C 2 a fairly 

TABLE I 

Values of C 2 . a = 0.01, 0.05, 0.10 
3456789 10 

4 - - -. 

- - .800 


5 - - .800 .833 

- .750 .800 .833 


- - - - .857 

6 - .750 .800 .833 .857 

- .750 .800 .556 .413 





.833 

.857 

.875 




7 

.750 

.800 

.588 

.612 

.467 




.667 

.750 

.555 

.425 

.449 

.426 






.800 

,833 

.857 

.656 

.670 



8 

.750 

.800 

.594 

.482 

469 

.389 



.667 

.531 

.425 

.413 

.357 

.375 

.358 





.800 

.833 

.660 

.677 

.543 

.554 


9 

.750 

.602 

.448 

.413 

.431 

.395 

.381 


.667 

.552 

.454 

.389 

.363 

.356 

.321 

.307 




.800 

.833 

.677 

.555 

.549 

.480 

.449 

10 .667 

.750 

.480 

.493 

.437 

415 

.349 

.340 

.349 

.487 

.430 

.380 

,373 

.357 

.315 

.309 

.280 

.269 


readily for small values of m and n. The values of C 2 for m, n < 10 are given 
in Table I for a = 0 01, 0 05 and 010. Since the distribution of C 2 is not 
continuous the probabilities P(C 2 > Cl) will, in general, be less than a. 
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It will be seen that if m and n increase indefinitely in the ratio n/m — y, 
then nC~ converges stochastically to 7 + 1 whereas nC 1 ranges from 0 to 
n/{n + 1) which indicates a tail to the right. This suggests that for larger 
values of m and n, it is reasonable to try to fit the distribution of nC 2 by the 
method of moments using a distribution of the form 


(ID 

which has 


(kxT - 1 

2 iy T&) 


e~ ikxt d(kx 2 ) 




Setting a: 2 = nC 2 , we see that we can consider nkC 2 distributed as x 2 with v 
degrees of freedom. Of course, v is not necessarily an integer, but x 2 tables 
may be used for approximate values of the probability that nkC 2 will exceed 
certain values, 2 or the values of nkC 2 that will be exceeded a certain per cent 
of the time. 2 More exact values of these probabilities that nkC 2 will exceed 
a certain value may be found from a table of the incomplete Gamma function. 1 

To calculate fc and v directly, the following formulas obtained by equating 
the mean and variance of (11) to the mean and variance of nC 2 may be used: 

(12) fc = am(n + 2 )/n, v = an(n + m + l)/(n + 1), 


where 

_ m(n + 3)(rc + 4) 

° 2 (m — 1 )(m + n + 2 )(n + 1) * 

If the fitted curve (11) is used to obtain significance values of nC 2 , there is a 
tendency toward rejecting slightly over 100a%, especially for small values of 
m and n. The error is probably due to fitting a curve having an infinite range. 
The discrepancy decreases as m and n increase. 

The goodness of fit at the 0.01, Q.05 and 0.10 significance levels was tested 
for two cases. 

Case 1. n = 9, m = 10; nk — - | | ° -, v = 

The exact distribution in the region under consideration is the following: 


Cl 

... 26 

.28 



.34 

36 

.40 

.42 

.44 

.48 ... 

p(c > cj) 

.. .121 

090 

.082 

.072 

.037 

.033 

025 

.025 

,016 

.007 . . 


The values of C\ from the fitted curve are Cm — 0.422, C\ t = 0.323. and 
Cjo = 0.277. The double rule indicates the divisions (from the fitted curve) 
for a = 0.01, 0.05 and 0.10. 


1 Karl Pearson, Tables Jor Statisticians and Biometricians, part 1, Table XII. 
1 R. A Fisher, Statistical Methods for Research Workers, Table III. 

4 Tables of the Incomplete Gamma Function, Biometrika Office, London 
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Case 2. n = 12, M — 12; nk ~ 65.068, v = 8.938. 

The important part of the exact distribution for our purposes is: 


<75 

215 

229 

243 

.256 

.270 . 

326 

.340 

.354 

.381 .. 

P(C 1 > Cl) 

120 

.109 

.078 

.057 

,046 


.014 

.011 

009 


The values of Cl from the fitted curve are Cm = 0.3315, Cm = 0.2587 and 
C*« = 0.2244. 


4. Examples. 1. Two samples of ten members each are drawn and it is 
desired to test, using a rejection region of size a, the hypothesis that these two 
samples could have originated from the same population about which nothing 
is assumed except that it is continuous. The first sample was found to divide 
the second sample into the following groups: 0, 0, 0, 3, 0, 4, 0, 0, 2, 1, 0. 

C" — (A ~ A) 2 + (A — A) 2 + (A — A) 2 + (A “ A) 2 + 7(A) 2 = -209 

which we see from Table I is not a significant value even for a = 0.10 since 
Cm - 0.269. 

2. A sample of 15 divides a second of 25 into the following 16 groups: 0, 1, 
0, 0, 5, 4, 1, 3, 9, 0, 0, 1, 0, 1, 0, 0. 

c 2 = (A - A) 2 + (A - A) 2 + (A - A) 2 + (A - A) 2 + 4(A - A) 2 + 8(A) 2 

nC 2 = 2.302 k = 7.511 v = 10.19 
nkC 2 = 17.295 

which gives a significant value for a = 0.10 but not for a = 0.05, since nkC\o = 
16.233, nfcC.05 = 18.568. Actually P(nkC l > 17.29) = .077. 


6. Remarks. If we set W equal to the number of m, which are zero and 
V — n + 1 — W then V is the number of non-zero m,; further, 2V ^ U where 
U is the total number of runs, the criterion proposed in the paper of Wald 
and Wolfowitz in the present issue of the Annals of Mathematical Statistics. 
Now, 

n+1 

(13) W= lim E<C‘, 
so that, setting 

(14) $ = Em f exp [ E ( —r-r - —Y1 X xrP(m,p), 

J |_ i—i \n -f- 1 m/Ji-i 


analogous to (7), we have 

E(WC 2 ) = lim 
* 1 .' ' 


Al ddl Js-fi 
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from which we can find 

2n(I — to) 

p re' a r*a* “ TO ( n + 2)(m + ft) 

and 

2 ^ 2 _ (« + 3)(n + 4) (to + n - 1) 

Pvc- = Pvc 2 ( n _)_ n + l)(m + n + 2) ‘ 

If ft/w = 7 (a fixed constant) and ft is large 


p = —;— • 

p J will be near I when n is much larger than m. This corresponds, in com¬ 
puting C\ to dividing the smaller sample into subgroups by the larger. In 
this case U and C 2 give essentially the same information. When m and n are 
more nearly equal the two criteria are quite different. For n > m, C 2 has 
fewer possible values than for ft < to, and is therefore a more sensitive test 
when ft < to. 

While it is doubtful that this test is biased for large samples, this question 
will not be considered in the piesent note. 

Princeton University, 

Princeton, N. J. 


SIGNIFICANCE TEST FOR SPHERICITY OF A NORMAL n-VARIATE 

DISTRIBUTION 

By John W. Matjchly 

1. Introduction. This note is concerned with testing the hypothesis that a 
sample from a normal n-vanate population is in fact from a population for 
which the variances are all equal and the correlations are all zero. A popula¬ 
tion having this symmetry will be called “spherical.” Under a linear orthogonal 
transformation of variates, a spherical population remains spherical, and conse¬ 
quently the features of a sample which furnish information relevant to this 
hypothesis must be invariant under such transformations. 

A situation for which this test is indicated arises when the sample consists 
of N ft-dimensional vectors, for which the variates are the n components along 
coordinate axes known to be mutually perpendicular, but having an orientation 
which is, a priori at least, quite arbitrary A specific application for two 
dimensions, treated elsewhere [1], may be mentioned. Each of N days fur¬ 
nishes a sine and a cosine Fourier coefficient for a given periodicity, and these, 
when plotted as ordinate and abcissa, yield a somewhat elliptical cloud of N 
points. The sine and cosine functions are orthogonal, and their variances have 
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equal expectancies for a random senes. The arbitrary nati,^ 

of axes appears here as the arbitrary choice ot phase, or oei gln time , 

five ellipses studied, three could easily have come from circular 'no' 1 t‘ 6 
(random), and two showed highly significant ellipticity P pu a ions 


2. Likelihood ratio criterion for sphericity. The method of Nevman d 
Peaison [2] will be used to derive a test criterion which se^g entirely suitable 
Let ft be the class of all normal n-variate populations, and l et u k e the su b c i ass 
of all normal n-varinte populations satisfying the hypoth esia 0 f '‘aphericitv^ 
The likelihood ratio criterion is obtained by taking the ratj 0 0 f ^ maxtm ^ m 
of the likelihood for variation of all population parameters S p ec if y ; ng u to the 
maximum of the likelihood for variation of all population parameters speci¬ 
fying ft That is, ‘ P 


(1) 


_ P(u max) 
~ P(ftmax)‘ 


For the set ft, the probability law for a single observati 0n of the n variatea 
may be written: 


•( 2 ) 




(i,j = 1,2 n), 


where c„ is an element of the matrix ||a.»ll > the a,, ^ e - ng var j ances and 
covariances, a, is the mean value of the variate x, in the Po pu i atlon and ^ ^ a 
constant the value of which does not concern us here. ^hen a sampl( / of N 
from ft has the probability, 


(3) 

Letting 

(4) 


P = a,)< “ f “ 0,) 


2 — Ahr, and 2 (*><• ~~ x '^ x ’ a ~ x >) =*N s 

a-l a—1 ** ’ 


differentiating the logarithm of P with respect to the Parameters a, and a,, 
and setting these derivatives equal to zero, the maximum likelihood estimates' 

(5) d{ = x x ; — s o > 

are obtained. Substituting these values in equation (3) \y e ^ nc j max j_ 

mum value of the likelihood is 

(6) P(ft max) = K N I s *> I e 1 ■ 

The derivation of P(u max) proceeds upon similar line S| j,u j s ampler for 
the probability law for the set u is obtained from (3) by aettimf-4 fff 

( 7 ) Ci, = OS,,-, 
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where c is any positive constant, and fi 1} = 0 if i j and 1 if 1 = j, The result 
is found to be 

(8) P(fo max) = K N ( Sl) y iNn e' iN 

where s 0 is defined by 

n 

(9) ns 0 = 2 s..' 

>=i 


The likelihood ratio criterion is therefore 



It will be convenient to designate the IVth root of this statistic as L sn , where 
the second subscript indicates the numbei of variates: 

(ID L„, ^ 

So 


3. The moments of the distribution of L sn when the population is spherical. 

The distribution of L an cannot be easily obtained in explicit form for a general n, 
but the moments of L, n when the hypothesis tested is true are easily found. 

Note first that L, n may be resolved into two factors which are, when the 
population is spherical, statistically independent 


( 12 ) 


L$7l - 


(SiSzS 3 S t • • • sj* 
So" 


The first factor is just the one appiopriate for testing the equality of the n 
variances when the orientation of the coordinate axes is fixed in advance, while 
the second factor is the square root of the determinant of correlation coefficients. 
The moments of the distributions of these two statistics are known [3], and 
since the two are independent (for zero correlation in the population), we may 
write: 


(13) M h (L sn ) = M h (A)M h (B), 

where A and B are used to indicate the two factors, and M/, indicates the /ith 
moment. The moments are given by 


(14) 


MhiBsn) 


TT rrfUV - i + h) 1 riUOV - D) 
»-i L r|(iv -i) r T$(n(N - 1 + h )) 


4. Significance test for n = 2. Foi n = 1, M h (L sl ) = 1 for any h, as it 
should, since La is then identically 1, and the concept of sphericity is meaning¬ 
less. For n = 2, the expression (14) reduces to, 




r(iv - 2 + h)r(N - 1 ) 
t(n - 1 + h)T(N - 2) 


N - 2 
N -2 + h 


( 15 ) 
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and the distribution is thus found to be 

(16) D(L. 2 ) = (N — 2)Lfr* dL.,. 

Thus for n = 2, the significance of the value of L a2 obtained from a given sample 
of N points in a plane is simply 

(17) /u.* < k>) = L.'r 2 . 

These results for n = 2 were obtained by another method in [1]. 


6. Significance test for n = 3. For n = 3 and higher values of n, no simple 
expression for the distribution seems obtainable. In this case it appears reason¬ 
able to fit a Pearson curve of the type, 

(18) V = Kx p ~\ 1 - xY~ l , 


by adjusting p and q so as to obtain agreement with the first two moments of 
the actual distribution The calculations were carried out for L s 2 3 rather than 
I s3 itself, to simplify the moment expressions. The first moment of is the 
second moment of L, 3 , and is given as a function of N by the equation, 


(19) 


Mi (AO 


(3A- 

W 


- 6)(3A7 - 9) 

- 2) (31V - 1) ’ 


Recurrence relations, similar to those noted by Lengyel [4] in carrying out a 
similar task, hold for the moments of L 2 3 ; hence, 


( 20 ) 


l*W = mi(A0mi(A7 + 2). 


Explicit solution of the equations for p and q in terms of N is possible 


(21) 

_ (9 N + 5 )(N - 2)(W - 3) 

P 2(9IW - 8N - 15) 

(22) 

2(9 N - 13)(9JV + 5) 
q 9(9 N* - 8N - 15) - 

For values of N 

> 30, acceptable approximations to p and q are obtained by 

carrying out the division indicated 111 (21) and (22). 

(23) 

p = *(2V - 4) + 2/9 + 70/81 (W + 1) • ■ • , 

(24) 

0 . 140 

Q 1 + 9(3 N -2)* '" * 


The values of p and q are given m Table I so that those desiring other than 
the standard significance levels may leadily enter the Pearson tables 
For N a multiple of 4 from 8 to 48, and a multiple of 10 from 50 to 100, the 
significance levels were taken from the Incomplete Beta-Function Tables, using 
adequate interpolation. The final Tabic I was then prepared by filling in the 
skeleton table by interpolation with respect to N. 

From the results of Wilks [5] it follows that —2 N log e L, n is, for large N, 
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TABLE I 


5 % l% t and 0.1% levels of significance for the 8-dimensional sphericity critenon, 
l *, = x' 2/w , and the values of p and qfor the Pearson Type I curves used in 

calculating these levels 


N 

5% 

1 % 

0 . 1 % 

V 

<1 

8 

0.172 

0.083 

0.030 

2.3239 

2.0312 

10 

.278 

.165 

.080 

3.3044 

2.0194 

12 

.366 

.243 

.139 

4.2911 

2.0131 

14 

.436 

.312 

.197 

5.2816 

2.0095 

16 

.494 

.372 

.252 

6.2744 

2.0072 

18 

.541 

.423 

.301 

7.2688 

2.0057 

20 

.580 

.466 

.346 

8.2642 

2,0046 

22 

.614 

.504 

.386 

9.2605 

2.0038 

24 

.642 

.538 

.422 

10 2574 

2 0032 

26 

,667 

.567 

.454 

11.2548 

2.0027 

28 

.689 

.593 

483 

12 2526 

2.0023 

30 

.708 

.616 

.510 

13.2506 

2.0020 

32 

.724 

.637 

.534 

14.2488 

2 0018 

34 

739 

.655 

.555 

15.2473 

2 0016 

36 

,753 

.672 

.575 

16.2458 

2 0014 

38 

765 

.687 

.594 

17.2447 

2 0012 

40 

.776 

.701 

.610 

18.2435 

2.0011 

42 

.786 

.714 

.626 

19.2425 

2.0010 

44 

.795 

.726 

640 

20.2416 

2,0009 

46 

.804 

.736 

.653 

21.2408 

2.0008 

48 

.811 

.746 

.665 

22.2400 

2.0008 

50 

819 

.756 

677 

23.2394 

2 0007 

55 

.834 

.776 

.703 

* 

* 

60 

.848 

.793 

.725 

28.2365 

2 0005 

65 

.859 

.808 

.744 

* 

* 

70 

.869 

.821 

.760 

33.2345 

2.0004 

75 

.877 

.832 

.775 

* 

* 

80 

.885 

.842 

.788 

38.2328 

2.0003 

85 

.891 

.851 

.799 

* 

* 

90 

.897 

.859 

.809 

43.2317 

2.0002 

95 

.902 

.866 

.819 

* 

* 

100 

.907 

.872 

.827 

48.2308 

2.0002 


‘No values for p and q were calculated for these values of A; the levels were obtained 
by interpolation (see text), 

distributed approximately like x with n(n — l)/2 degrees of freedom. How¬ 
ever, equation (24) above suggests that for large N one may get a very good 
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approximation (for n = 3) by setting q = 2; the significance test for n = 3 
then becomes, 

(25) P(L e3 < Lla) = iL,'r 4 [(iV — 2) - (N — 4)L,'S]. 

Probably similar approximations can be found for other values of n. It is a 
pleasure to acknowledge the helpful comments and advice which I received 
from Mr, A. M. Mood of Princeton. Recognition is also due Mr. Wallace 
Brey, a student assistant under the National Youth Administration, who aided 
m the computations. 
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A SIMPLE SAMPLING EXPERIMENT ON CONFIDENCE INTERVALS 

By S. Kullback and A Frankel 

1. Introduction. In order to illustrate some of the notions of the theory of 
confidence or fiducial limits in connection with a course in Statistical Inference 
at the George Washington University, we had the class carry out certain simple 
experiments, following a suggestion in one of Neyman's papers on Statistical 
Estimation [1] In the belief that the experimental data may be of interest 
to others, we present the results herein. 

2. The problem. We consider the problem of estimating the range 9 of a 
rcctangulai population defined by p(x, 6) dx = dx/6, 0 S i $ ( and in par¬ 
ticular, for simplicity, wc limit ourselves to samples of two and four. Wc 
considei tlnee possible approaches to the pioblem, viz , by using (a) the sample 
range (b) the sample average oi total (c) the larger (largest) sample value. 
Let us consider each in turn. 

(a) Sample range Wilks [2] has shown that for samples of n and confidence 
coefficient 1 — a, the confidence or fiducial limits for the population range 6 
are given by r and r/4/ a , where r is the sample range and \p a is determined by 

(1) — (n — l)^a] = a 

For n = 2, a - 0.19 and n = 4, a = 0.1792, (1) yields f a = 0.1 and \p a = 0.4 
lespectively. Accordingly, for samples of two with confidence coefficient 
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1 - a = 0,81, and for samples of four with confidence coefficient 1 - « = 
0.8208, the confidence interval is respectively given by 

(2) (r, lOr) and (r, 2 5r). 

The length, X,, of the confidence interval is respectively 9r and 1 or. Using 
the distribution of r, n(n — 1)(0 - r)r n ~ 2 / 8 n , we have for samples of two: 
E(\ r ) = 30, ai T = 2 12130, and for samples of four: F(X r ) = 0,90, <r Xf = 0.30. 

(i b) Sample total. Following Neyman [1, p. 357] let us denote by A (O') the 
region defined by 

(3) d — A < xx + x 2 < 6 4- A 

where 0 is the population range, x x and x 2 the sample values of the sample E 2 
and A is selected so as to have P[E% eA( 6 ) | 0 ] = 1 - a. It is readily found 
that P\Ei eA( 8 ) | 0) = [0 2 — (0 — A) 2 ]/0 2 = 1 — a from which we find that 
A = 0(1 - a n ) Accordingly (3) becomes da 112 < Xi + z 2 < 0(2 - « 1,2 ) ) 
yielding the confidence limits (rci + x*)/(2 - a n ), (x t + x 2 )/« 1/2 For the 
confidence coefficient 1 — a = 0.81 the confidence interval is given by 

(4) [0.6394(xi + x 2 ), 2.2941(n + x 2 )]. 

The length of the confidence interval is given by \ T = 1.6547(xi + x s ) so that 
E(\ r ) = 1.65470, <r Xr = 0.07550. 

Let us denote by A'( 0 ) the region defined by 

(5) 20 - A < xj + x 2 + i, + Xi < 20 + A, 

where 0 is the population range, Xi, x 2 , x 3 , x t the sample values of the sample 
Ei and A is selected so as to have P(Ei e A 1 ( 8 ) | 0 ] = 1 — a. Using the known 
distribution of the sample average [3] and 1 — a = 0.8208, it is readily found 
that 

1 * 1 + »(*)}-"" 

from which we find that A = 0 7880. Accordingly, (5) becomes 1 2120 < 
zi + + X 3 -f X 4 < 2.7880, yielding the confidence interval 

(6) [0.3587(xi + + x 3 + %t), 0.8251 (xi + x 2 + x 3 x 4 )] 

The length of the confidence interval is given by X T = 0.4664(xi + x 2 + x 3 + x 4 ) 
so that E(\ t ) = 0 93280 and <r Xr = 0 26790. ' 

(c) Larger (largest) sample value. Again following Neyman [ 1 , p. 359] let us 
denote by A 1 ( 8 ) the region defined by 

(7) qd <L < 0 

where 6 is the population range, L the larger of the two sample values and x? 
and q, a number between zero and unity, to be determined by P(F 2 e Ai( 8 ) | 0} = 
1 - a. It is readily found that P\E t «Ai( 0 ) | 0 ] = ( 0 2 - 3 V)/fl* = 1 - a, 
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from which we find that q = a 112 Accordingly, (7) becomes 6 a 112 < L < 0 
yielding the confidence limits L, L/a 12 . For the confidence coefficient 1 — a = 
0 81 the confidence interval is given by 

(8) ( L , 2.2941L). 


TABLE I 


No. of cases of 
coverage per 

Frequency 

set of 100 
samples 

Range 

Sum 

Larger (Largest) 

i 

X 

Samples 
of two 

Samples 
of four 

Samples 
of two 

Samples 
of four 

Samples 
of two 

Samples 
of four 

69 





1 


70 







71 





1 


72 







73 






1 

74 


1 



1 


75 







76 

4 


3 


4 

1 

77 

2 


6 

1 

2 


78 

3 


6 


3 

1 

79 

9 

2 

4 

2 

3 


80 

3 

1 

6 


4 


81 

2 

2 

1 


3 


82 

2 

1 

6 

1 

2 

5 

83 

3 

3 

3 

1 

5 

3 

84 

3 

2 


1 

4 

1 

85 

3 



3 

2 


86 

2 

2 


2 

2 

1 

87 

1 

1 

2 

1 


1 

88 



1 

2 

1 

1 

89 

1 


1 

1 



90 







91 

1 






Average . . 

39 

15 

39 

15 

39 

15 

81.1 

82.1 

80.2 

84.2 

80 2 

82,1 


The length of the confidence interval is given by = 1 2941L so that using 
the distribution of L, nL n ~ l dL, we have E(\ L ) = 0.86270 and <r X; , = 0 30500. 
Incidentally, since L ^ xi + we have 1.2941L < 1.6547 (aii + x 2 ) so that 
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m every ease, for samples of two, the confidence interval of procedure (c) is 
shorter than the confidence interval of procedure (b) 

For samples of four, we consider the region (7) where L is the largest of the 
sample values 2 \, , x 3 and x K of the sample E 4 It is readily found that 

P{Ei (Ai{0) | 6) = (0* - ?V)/0 4 - 1 - a, from which we find that q K = a , 
For a = 0.1792, q = 0.6500 so that (7) becomes 0 65060 < L < d yielding 
the confidence interval 

(9) (L, 1.5370 L). 

The length of the confidence interval is given by X; = 0 53701 so that E{\ L ) = 
0.42960 and <t Kl = 0 08770. 


TABLE II 



Sample 

Range 

Sum 

Larger (Larg¬ 
est) 


size 

Theo¬ 

retical 

Ob¬ 

served 

Theo¬ 

retical 

Ob¬ 

served 

Theo¬ 

retical 

Ob¬ 

served 

Confidence Coefficient 

2 

.8100 

.8110 

.8100 

.802 

.8100 

.8020 


4 

.8208 

.8210 

.8208 

.842 

.8208 

.8210 

Average length of eonfi- 

2 

3.0000 

2.9660 

1 6547 

1 6441 

.8627 

8556 

dencc interval per set 
of 100 samples 

4 

.9000 

.8976 

.9328 

.9296 

4296, 

.4272 

Standard deviation of av- 

2 

.2121 

.2133 

.0676 

.0581 

.0305 

.0293 

erage length of confi¬ 
dence interval 

4 

.0300 

.0335 

.0268 

.0140 

.0088 

.0093 


3. The Experimental Data. We considered the rectangular population with 
0=1 and obtained the sample values by using pairs of digits obtained from 
Tippett's random sample tables [4]. Using these observed values the confi¬ 
dence intervals given by (2), (4), (6), (8) and (9) were computed and the numbei 
of cases in which the value 0=1 was covered, noted. In all, 3900 samples 
'“oTTwo were observed, subdivided into 39 sets of 100 each The samples of 
four were obtained by combining pairs of samples of two and there were studied 
1500 samples of four, subdivided info 15 sets of 100 each. Table I gives the 
observed distribution of the number of cases of coverage per set of 100 samples 
of two and of four. The length of the confidence interval obtained by each of 
the three procedures was obtained and the observed mean and standard devia¬ 
tion of the distribution of the average length of the confidence interval per set 
of 100 samples computed. (Since they are averages of 100 values, these ob¬ 
servations are practically normally distributed.) Table II summarizes these 
results 
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THE NUMERICAL COMPUTATION OF THE PRODUCT OF CONJUGATE 
IMAGINARY GAMMA FUNCTIONS 


By A. C. Cohen, Jr. 

The difference equation 

(1) /»+! _ ~t~ CjX -f C2 

/» -f- csX + C4 

was used by Professor Harry C. Carver [1] as the basis for graduating frequency 
distributions in a manner analogous to the use of the differential equation 


1 dy _ a — x 
y dx bo + bixb 2% 2 

in the Pearson system of frequency curves. In order to determine a particular 
fx by Professor Carver’s method it was necessary to perform the complete gradua¬ 
tion from the lower limit of the range up to and including the required f x . 
When x is large and only isolated values of f x are required it seems desirable to 
have a method for computing f x directly, and the present note seeks to accom¬ 
plish this purpose. 

It is well known [2] that the difference equation 

(2) f*±t _ Qs ~ «i)(g — «») • • • ( x ~ «t.) 

/* (x - Pi)(x - fr) • • ■ (s - /3 m ) 

has the solution 


(3) 


/* 


... „«r(z - ai) ... r(* - a„) 
x r(* - ft)... r(* - P m )’ 


where w x is a periodic function of x (w x = w x+n = •■• = &) and T(x + 1) 
for x, a positive real number may be defined in the usual manner by the second 
Euler integral 


(4) 


r(x + i) = [ 

Jo 


t'e-'dt 
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which obeys the recursion formula 

(5) r(* + 1) = xT(x). 


When a; is a positive integer 


(6) r(* + l)**l. 

Equation (1) is seen to be a special case of (2) for n = m — 2 and accordingly, 
the solution may be written as 


(7) 


U = K 


r(a: — ai)r(£ — as) 

r(* - J3i)r(a? - /3 2 ) 


1 


where ai and are roots of x 2 -f- c x A 4" c 2 = 0 and /3i and /3 2 are roots of 
+ c & + to = 0. The following simple examples illustrate three special 
cases of this solution. 

I. All a’s and /3’s are integers. 

_ 2(g a + 9z 4- 20) 

U x i + 5x + 6 

has the solution 


, _ K <y* Hx + 4)r(y + 5) 
Jx r(x + 2)r(* + 3) 


which, with the aid of recursion formula (5) can readily be verified by direct 
substitution. 

II. Either the a’s and/or the /3’s are real irrational numbers 

/*+1 _ x 2 + 5x + 6 
U + 3s 4~ 1 

has the solution 


f =K _ r(a + 2)r(* ± 3) _ 

1 r[* -f- K3 - VSlrb + 4(3 + Vs)] 


which, with the aid of the recursion formula (5) can also be verified by direct 
substitution. 

III. Either the a’s and/or the /3’s are complex. 

/g+i _ a; 8 4~ 8a; 4~ 17 
z 2 + 10* 4- 29 

has the solution 


t - K + 4 + 4- 4 — *) 

}x r(* 4- 5 4 - 2i)r(* 4- s - 2 ») ‘ 

Since the recursion formula (5) is also valid for complex arguments [3], this 
solution can be verified by direct substitution just as in the first two cases. 
The evaluation of f x for a given, x in cases I and II involves only computation 
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of quantities of the form r(a;) which can be accomplished through the use of 
existing tables of Gamma Functions for-small values ofx and through applica¬ 
tion of Stirling’s formula for large values of x. Evaluation of f r in ease III, 
however, involves the computation of quantities of the form r(u -f i«)r(u — iv), 
a problem which seems to have escaped previous attention. The remainder of 
the present discussion will center about this quantity. 

The Gamma Function for a real positive argument has been defined by 
equation (4), but for the present purposes, it is more expedient to use the 
definition 


( 8 ) 


r(z) = Lim 

n->® 


_ nln* _ 

z{z -j- 1) • • • [z -}- n) 


which is valid for all values of the complex argument z except at the poles 
(z = — 1; z = —2, etc.). The above definition is equivalent to (4) at all points 
where (4) is valid [3]. 

From equation (8), it immediately follows that r(u + iv)F(u — iv) is a real 
number. In fact, we have 


r(u + tu)r(u — id) = Lim 


(n')n 


2 Zli 


[u 1 + ^[(u + l ) 2 -h tt a ) * - - [(« -f n) J + d 2 ] ' 


We now develop a formula applicable in evaluating this quantity when u is a 
sufficiently small positive integer. As a consequence of equation (8) it can be 
shown that [3] 


r(*)r(i - z) = -r- 


(9) . 

sin tz 

Let z = iv in the above equation and we immediately obtain the result 


( 10 ) 


r(i»)r(-w) - - 


— e~ rr 


When^M is a positive integer, we may write 

(11) T(u 4 ») = (u - 1 + iv)(u — 2 + u>) ■ ■ ■ (iv)Y(iv), 

(12) P(u — iv) ~ (u — 1 — iv)(u — 2 — iv) ■ • ■ (-iv)r(-iv). 

The product of (11) by (12) gives 

r(u 4 - w)r(« — tv) = d 2 (d s 4 - l) ■ • • (v* 4 - u — i*)r(«)r(— iv) 

which upon substitution of the value found in Equation (10) for r(ta)r(— iv) 
becomes 


- 2rV - II (v a + r 2 ). 

!*» _ e ~" r-J 


V-L 


(13) 


r(« + iv)r(u — iv) = 
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To obtain a result that is applicable when u is not a positive integer, we 
make use of Stirling’s formula for complex arguments. Lipschitz [4] proves 

Log r(z) = log \/2 tt +■ (2 — J) log z 

^ - I {_ 1 V -Bjm+l 1 

V 1 fa {2m + 1)(2 m + 2) *Sh 

and that the remainder after the mth term is 

„ - (-l)"* 1 *-* 1 t , 

mH (2m + 3) (2m + 4) z 8 ™+* U + tl) ’ 

where e < 1; «' < 1. B im+ i designates the Bernoulli numbers. (B, = 

Bi = -g V; B 6 = Tj; etc.) We are thus able to write 

Log r(it + iv) = log T(Re <<f ) 

= log + (Be** - i)(log it + iv) 

(-1 )”B im+1 e" (2M+1|,v 


( 16 ) 


- Be* + £ 


m«-0 (2m + l)(2m + 2) ’ 


where p = tan 1 - and R = vV + u 2 i 
'll 


Log r(u - iv) = log r(fle "') 

(jg) = l°g + (ite - ^ — |) (log it — ip) 


He I /o_I 1 \ /O... 1 ON ’ 


(2w+l)*f 


m—0 (2m+l)(2m+2) it 2 *+> ' 

Adding (16) and (16), we obtain 

Log T(w + i«)r(« - iv) = log 2ir + (e' v + e~' 9 )R log it - log it 

+ Rivte* ~ e~' v ) - R(e' v + eT’ v ) 

I V ( l) i^2m+l / (2m+l)iy> . — (2m+l)»(»\ 1 

fa (2m + l)(2m + 2) y ^ J R lm +' 


which upon being simplified becomes 
Log r(u -+• iv)T(u — iv) 

= log 2ir + (2 u — 1) log it — 2(pv + u) + p), 

where 


(17) 


(18) 


HR, p) 


E (-iriWi i 
fa (2m + l)(2m + 2) it 3m + 1 


cos (2m + l)p. 


This result is somewhat similar to that obtained by Karl Pearson [5] in con¬ 
nection with the evaluation of the (?<„) integrals of his Type IV frequency 
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curve. If E > 1, the expansion of ip) is asymptotic and the greatest 
numerical value that the rath term can have is 


Bim+i 1 

(2 m + l)(2m + 2)'W™+ 1 ' 

Thus according to Lipschitz results, the error committed in dropping all terms 

after the with will not exceed: + 2 ) ]^+T The following 

table gives an indication of the size of the error: 

Terms omitted Error committed in 

after ^(R, <p) less than 

1st ±.0833 3333/E 

2nd ±.0027 7777/E 3 

3rd ±.0007 9365/E 6 

4th ±.0005 9524/E 7 

5th ±.0008 4175/E 9 . 


It is now obvious that formula (18) will give satisfactory results whenever E 
is sufficiently large. The degree of accuracy required together with the value 
of E will determine the number of terms of ^(E, tp) to be computed. 

We now turn to the solution of the example under Case III and proceed to 
calculate U , fa , and Sim when Jo - 29. We may write 

_ ™ r ( 5 + 2t)r(5 - 2 i) 
r(4 + *)r(4-i) ■ 

Application of formula (13) gives 


T(5 + 2i)T(5 - 2f) = 244.043 648, 
r(4 + i) T(4 - 0 = 27.202 292, 


from which, K = 260.171 676, 


Si 


= 260.171 676 


r(8 + 0r(8 - i ) 
r(9 + 2r)T(9 - 2f)' 


Again making use of formula (13) we have 


S< 


= 260.171 676. 


22,243,314 

1,020,258,635 


5.6722, 


/is = 260.171-676 


r(i9 + 0r(i9 - 0 
r (20 + 20 r (20 - 20 


Since E is fairly large in this instance, formula (17) is used and all terms of 
i(R, <p) after the first are dropped. This resplt gives 


log T(19 + 01(19 - 0 = 31.5892 259, 


log T(20 + 20r(20 - 20 = 34.0812 782. 



218 


A. C. COHEN, JE. 

Accordingly, log fa = 9.9232 071 -10 

and / 16 = .8379. ' 

By the same method f m is calculated and we find fm = .008723. 

As a check on the accuracy of the results obtained in the above computations, 
values of f x for x ranging from 1 to 15 were computed, using the given equation 
as a recursion formula. That is 

/< - §/. = «, /. = = H.05, etc. 

These results are given in the following table, and it is to be noted that the 
values in the table for /< and agree with those previously computed by use 
of formulas contained in this paper. For obvious reasons, no attempt was 
made to compute the value of fm by this method. 


TABLE I 


X 

/* 

X 

fx 

X 

/(*) 

0 

29.0000 

5 

4.3375 

10 

1.6228 

1 

17.0000 

6 

3.4200 

11 

1.3961 

2 

11.0500 

7 

2.7633 

12 

1.2135 

3 

7.7142 

8 

2.2779 

13 

1.0644 

4 

5.6722 

9 

1,9092 

14 

0.9411 





15 

0.8379 
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COMPARISON OF PEARSONIAN APPROXIMATIONS WITH EXACT 
SAMPLING DISTRIBUTIONS OF MEANS AND VARIANCES 
IN SAMPLES FROM POPULATIONS COMPOSED OF 
THE SUMS OF NORMAL POPULATIONS 

By G. A. Baker 

1. Introduction. Biological and sociological data are often “non-homoge- 
neous” and of such a nature as not to be easily separated into components. 
Non-homogeneous populations have been discussed by Karl Pearson, Charlier, 
and others. Non-normal material has been discussed by many writers See 
for example, A. E. R. Church [1] and J. M. LeRoux [2] for a discussion of 
moments of the distributions of the means and variances for samples from 
non-normal material. 

In a previous paper [3] the author has given the distributions of the means 
and standard deviations of samples from certain non-homogeneous populations. 
The purpose o'f the present paper is to extend the results given in [3] and to 
compare the moment approach of the Pearsonian school with the true distri¬ 
butions. 


2. Moments of the distribution of means of samples of n from a non-homo¬ 
geneous population. Consider a population with distribution 


( 2 . 1 ) 


f(x) = 


r.- 


-Jz» 


(1 + fc)V2^r L 

The first four moments of (2.1) about x = 0 are 

km 


a J 


( 2 . 2 ) 


i 

v x - 


I 

v 2 = 


v 3 = 


1 + k 
1 

1 + k 

bn 
1 + k 


[i + fc(<r 2 + m 2 )] 


[3a 2 + rri] 


/ 

Vi 


= —[3 + k(Sa* + 6 mV + m% 


The means of samples of n drawn at random from (2.1) are distributed 
according to 


n 

'±0 

l\ k B j 

) -- 4 



V2ir(l + k) n 

0 V 

>/ VV + n - s [ 

[ sa 2 + n - s J 

_ 
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Denote by m! p the moments of (2 3) about x = 0 and by rn p the moments about 
the mean. Then in view of the relations 


»! s r 


r-l 

= XX 7—- 


n\ 


(w-s)la' n (n - s)\ (s - r + i)\ 

(^■4) A r o = 1, A r ( r - 1) = 1, ^4si = 3, 4 41 = 6, Am = 7 } 

^■51 10, -Afi2 = 25, 4.63 — 15, 

and similar relations, and reduction to moments about the mean we obtain 


/ km / 

m i = ■ r ~ r -- = V! 


1 + fc 


“ »7IT*) [* + + irj 

""" sort) 5 D’"" 3 + rrl ”* ! ] 

mi = rA(l + ky [ 3 ^ nfc + ^ + 3 (n + k) + 6(n - l)fer 2 


(2.5) 


+ _ |fc + („ _ l)}m 2 + — ((n - l)fc 4-1 
(T+lc) 2 ^ ~ ^ k l)m 4 J 


2 2 
TO a- 


m 6 = 


Jo 


n 4 (l + fc) 3 |_ 16 ^ 2w ~ 15 {fc + (2n - l))m 

+ 30 (n — 1)(1 — fc)TOo- 2 

+ rqpi { ~ (n ~ 1)ft + 4 (" - D* + !)mV 

+ Y+l k - 4(n - l)k + (n — 1 )( to 3 
+ (l + fc )2 + (~ 10a 4- ll)fc 2 + (10 n — 11)A; + l)m 6 J. 

Chureh 3 S 'TcheVycheff 6 ^ ^ &gree With th<3 reSUltS given by 


( 2 . 6 ) 


The betas of (2,4) are 

, 2 2 
k m 


lB! = 


n( 1 + k) 


q 2 q I 1 k 2 

6<r — 3 + , m, 


1 + fc 


ka + 1 + — f w 2 
1 + fc 
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(2.7) iBs — 3 = 


, To lot 6 2 . 6 2 2 I fc* — 4 k "p 1 4*1 

*L 3+3f ~ ~r+* m + r+jt w " + “ow*J 


71 ka 1 -f- 


k 2 

r+i m . 


1 B 1 vanishes if k = 0, m = 0, or k = 1 and cr = 1. If k and cr are constant 
and m approaches infinity iB! approaches (1 — k) 2 /nk. If k and m are constant 
and a approaches infinity iBi approaches zero. iB 2 — 3 vanishes if k = 0, 
k = oo, or if m = 0 and <r = 1. If k and cr are constant and m approaches 

TABLE I 


ms and pint compared for four sets of values of k, cr 2 , and m 


Sets of values 
k a 1 m 


1/2 1/4 1.1 


1/3 1 3 2 


4 599 1.228 

n‘ n { 


89,702 39 322 

n 1 n* 





.096 ,165\ 

6-T ) 

n 7i* J 



infinity then jB 2 — 3 approaches (k 2 — 4 k + 1 )/nk. If k and m are constant 
and a approaches infinity then iB 2 — 3 approaches 3 /nk. 

It is of interest to compare the higher moments of (2.3) with the higher 
moments calculated from the first four moments on the assumption of a Pearson 
curve in place of (2.3). On this assumption 


P m& = 


2 m 3 (m 4 + 7mlmi — 3irhtnl) 
9 m\ — *ntmi + 3m| 


It is seen that (2.8) bears little resemblance to . If we consider the 
difference — ms we see that it 4s of the same order in l/n as is m* and the 
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numerator is of the 16th degree in k, m, and cr; a very complicated locus. m§ and 
j,in 6 are compared for certain values of the parameters of (2.1) in Table I. 

Table I shows that the coefficients of l/h 3 in the expressions for and v m h 
differ by from two to more than 40 per cent. The coefficients of 1/ti 4 differ 
even more. The assumption of Karl Pearson’s curves to represent the distri¬ 
bution of means of samples of n from non-homogeneous populations seems to 
be adequate in some cases but inadequate in others even for moderate values of 
the parameters. 


3. Moments of the distribution of variances. In [3] an estimate of n times 
the standard deviation squared is expressed as 

(3.1) W = (n — s) s\ + stf 2 + --— (fni + nk) 2 , 

n 


where a bar over a letter means an estimate of the corresponding population 
parameter and where (n — s) denotes the number drawn from the first com¬ 
ponent of (2.1) and s denotes the number from the second component. 

For the direct calculation of the moments of the distribution of variances 
it is easier not to use the distribution given in [3], but to proceed as follows. Put 

{n — s) ffi — y, SJ 2 = x, —-— (mi + m*) 2 = z. 


Of course, for population (2.1) <rj = 1, <r 2 = <r, m x = 0, m t = m. The variables, 
x, y, z are all independent in the probability sense and their probability distri¬ 
butions are well known. Hence the moments of 

(3.2) - = x + y + z 

n n 


can be directly calculated. 
For instance, if p = 1 then 


(3.3) 




f~ [ka + 1 + 


k 

l + k 



In general, of course, the moments about the mean check with the values given 
by Church 

It is generally recommended to represent the distributions of variances of 
samples from non-normal parents by Pearson’s curves. Let us examine the 
results of this procedure in a special case. 

Suppose that the sampled population is 


(3 ' 4) = + 

The first eight moments of (3.4) which are needed m the calculation of the first 
four moments of the variances are: 
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(3.5) 


v[ = 1.7000 v B = 0 

v, = 3.8900 ve = 294.47 

v 3 — 0 vi = 0 

Vi = 28.692 v a = 3,818.4. 



Fig 1. Comparison of the True Distribution of the Variances of Samples of 4 
Drawn from the Non-Homooeneous Population (3.4) with the Corresponding 

Empirical Pearson Curve 


The first four moments of the variances of samples of 4 from (3.4) are: 

, M[ = 2.918 Mi = 4.745 

(3.6) 

2 M 2 = 3.396 = 41.52. 

Hence 2 Bi = ,60 and 2 B 2 = 3.6, k = — .87 which calls for a type 1 curve. The 
equation of the curve is 

( A 2,191 / \ 16 81 

1 + im) O-afW 
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with its origin at its mode The corresponding true distribution with the 
origin at the beginning of the range is 

U , = «r*[.3989\/* + .003550 sinh (3.4^) 

(3-8) 

+ .0005454 sinh (6.8\A)]. 

Distribution (3.8) differs slightly from the corresponding result given in [3] 
because of an error in that paper. 

The two distributions are compared in Figure 1. It is seen that the two 
distributions are quite different. As the number of components of distributions 
similar to (3.8) increases, which is true as n increases, the distributions may 
be expected to become smoother and more closelv representable by a singk 
smooth curve. 

4. Summary. The moments of the distribution of the means of samples of n 
from a non-homogeneous population composed of two normal components are 
given up to and including the fifth. This fifth moment is compared with the 
fifth moment calculated on the assumption of Pearson’s curves to represent 
the distribution of means. The B's of the distributions of the means are dis¬ 
cussed in certain limiting cases. It appears that for small samples and extreme 
values of the parameters, and in some 'cases of moderate values of the paramer 
ters, the Pearsonian approximations give poor results. 

Some identities involving the binomial coefficients are given which permit 
the reduction of the moments of the distribution of means calculated directly 
to forms given elsewhere [1], A method is given for the direct calculation of 
the moments of the variances of samples from a non-homogeneous population 
composed of two normal components. An indication of the closeness with 
which a Pearson curve can be made to fit the distribution of variances in small 
samples from a non-homogeneous population is given in Figure 1. 
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A LEAST SQUARES ACCUMULATION THEOREM 

By W. E. Bleick 


The following simple least squares theorem does not seem to have been men¬ 
tioned in the literature, and has at least one practical application. 

If A * (%) and B*(x) are polynomials of the same degree which are least squares 
representations of the functions A(x ) and B(x) respectively, for the values 
*11 ** i j ■' ■ > t then 

(1) E A*(xt)B(x t ) = ^ A(xt)B*(x t ) = E A*(x t )B*{x t ). 

(-1 (-I (»i 

To prove the theorem let 

(2) A*(s) = '£ l a l x' 


and 

(3) 


B*(x) = E hx 3 . 
1-0 


Then the normal equations for the determination of o, and £>,- are 

tn p 

(4) E = E ^A(x t ), k = 0,1, 2, • • 

i-0 l-l 

and 

(5) E Wa = £ x)B{x t ), 


l—Q 


h — 0, 1, 2, < ■ 


i m, 


., », 


where 


s r = ^ . Hence, 


by (2) and (5) 


A*(s ( )B(a: t ) = E |~E B(x t ) 
«-i (-i !_«-« J 


( 6 ) 


= E <*< E B{x t ) 

i-0 t-1 

m n 

= E E Oibj8j+i if n S: to , 
<-0 ;-0 


= ^ W m. 

<-1 

Similarly it can be shown that 

(7) ■A(xi)B*(x i ) = £ A*(x t )B*(x,) if m £ n, 
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Combining (6) and (7) we have 

(8) X A*(xi)B(x t ) = X A(xt)B*(x t ) = X A*(x t )B*{x t ) if m = n. 

(-i (-1 i-i 

In the particular case A (x) = B(x), equation (8) gives the interesting result 

(9) X A*(x t )[A{xi) - 4*U)1 = 0. 

<-i 


An obvious extension of equation (6) is 

(10) X zlA*(xt)B(xt ) = ^ xtA*(xt)B*(.x,), if n ^ m 4- 9, 

<-i <=i 


where ^ is a positive integer. 

A practical application of (8) has been made by one large insurance com¬ 
pany in the case n = n = 1. Suppose that A (x) represents an annual payment 
made x years ago and is an approximately linear function, and that B(x) repre¬ 
sents a compound interest function. Then, even if B(x) is not a linear function, 
we may write approximately 

£ A(x)B{x) s t, A(x)B*(x) 


( 11 ) 


~ A(x)(ba + 'hia:) 

~ bg ^ A{x) -f b x X xA(x) 

*•1 X*-l 


£ bo i,A{x) + biX T,A(y). 

*■*•1 y~x 

Thus if a year-by-year record is kept of the annual payments A{x), the sum 

p p v 

X A (x), and the double sqm X X A{y), and if bg and bi are tabulated func- 

x^l j/»x 

tions of p, equation (11) affords a convenient method of evaluating X A(x)B(x) 
approximately. 

The author wishes to acknowledge that the case m = n = 1 of equation (8) 
and the above application were brought to his attention by John K. Dyer. 


Cooper Union, New York, N. Y. 


















PARABOLIC TEST FOR LINKAGE 

By N. L, Johnson 

1. Introduction. In this paper a problem in testing statistical hypotheses 
which has applications in genetics will be treated from the standpoint of the 
Neyman-Pearson approach, This approach has been developed in a scries of 
papers, [4], [5], [6], [7], [8], [9], [10], to which the reader is referred for definitions 
of the concepts of a simple statistical hypothesis, critical regions, power function 
of a test with respect to alternative hypotheses, and that of a test unbiased in 
the limit employed in the present paper, 

2. Statement of Problem. We shall consider M independent experiments, 
which will each yield results falling into one of the four categories described by 
the possible combinations of the 4 events a, not-a (or a), h, and not-h (or 5) 
as set up in the following table. 



• 

a 

not -a 


b‘ 

Pi 

Pi 

Pi 

not-6 

V* 

Pi 

1 - Pi 


Pi 

1 -Pi 

1 


We shall assume that the marginal probabilities are known and have values 
Pi, 1 — Pi , Pi , 1 — Pi as shown in the table. Thus Pi = probability of 
event b happening whether event a occurs or not, It is obvious that if, further, 
the probability of a result falling in any one category or cell is fixed, then the 
other three cell probabilities will also ,be fixed. For if pi , p 2 , p 3 , p t be the 
four cell probabilities as shown in the table above, we must have 

(i) pi + Pi = Pi; pi + Pi = Pi ; pi + Pt = l - Pi . 

Hence the values of the cell probabilities will be determined by a single parameter 
0, say, as follows 

pi = PxP a e* p 2 = P,( 1 - Pj) 

w 

Pa = P*(l - Pie) p 4 = 1 - Px - P 2 + PxP/ 

The range of values which 0 may take for the set of admissible hypotheses is 
found from the conditions 
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(3) 0 < Pi < 1 (i = 1, 2, 3, 4) 

to be 

(4) - *> < 9 < min (-log Pi, -log P 2 ) if Pi + P 2 < 1 
but 

(5) log (P7 1 + P 2 1 - Pl'Pi 1 ) < 9 < min (-log Pi, -log P 2 ) if Pi + P 2 > 1. 

The hypothesis tested, Ha , is that 9 = 0, i.e. that the events a and b are 
independent. It will be noticed that H 0 is a simple hypothesis, since it specifies 
the probability law of the observed variables completely. In fact, if m, be 
the number of results out of our M experiments which are in the ith category, 
then mi, mi, vit, mi are our observed variables, and we have 


(6) P{wii = m[, mi = m'i, mi = m'z, mi — m[ | Ha} = 


M\ p ™*i Po'l p’oi Pm 
m[ 1 m 2 \ mi\ 1 


where pm is the value of pi when 9 = 0. 

This is the conceptual model used in testing for linkage in two pairs of genes; 
H 0 corresponds to the hypothesis “there is no linkage.” Fuller explanations 
are given by Fisher [3]. It should be noted, however, that Fisher uses a pa¬ 
rameter 9 corresponding to in this paper. 


3. Basis of Selection of Test. The question now arises; what test shall we 
choose for the hypothesis Ho? That is, what should the critical region w be 
to give us results as satisfactory as possible? The main aim must be to avoid 
errors, both of first and second kind, as far as possible. The first kind of error 
is subject to control, since the probability of the sample point E falling in w 
when Ho is true (which we shall denote hy P\E e w \ H 0 }) can be determined 
approximately, Ha being simple. The critical region w is therefore chosen, if 
possible, to give a definite level of significance to the test associated with it. 
However, there will usually be many regions which will do this, and in 
order to decide which of them give more satisfactory results we consider 
(1 — P{E t w | H})} i.e. the probability of the second kind of error with respect 
to an alternative hypothesis H, the first kind of error being fixed. 

In the present case H will be determined by 9 and so we may put 
P\E (w\H) = j3(w | 9), where ${w | 9), considered as a function of 9, will be 
the power function of the test associated with the critical region w. We want 
w to be such that f)(w \ 0) = a. a being the fixed level of significance while 
j3(w | 9) is as large as possible. 

It is also desirable that we should accept the hypothesis Ha more often when 
it is true than when any one of the alternative hypotheses ( H ) is true. Ex- 
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pressed symbolically, this means that 

(7) P( w I 0) < (S(w | 6) for all 6 7* 0. 

Any test satisfying the last condition is said to be unbiased. 

If (3 and — are each continuous and differentiable functions of 8 , and we 

do 

consider only those alternative hypotheses specified by suitably small values 
of 8, sufficient conditions for the test to be unbiased will be 

» - "• 

<» SL>«- 

According to the terminology recently adopted by Daly [1], the tests of 
which it is known only that they satisfy (8) and (9), are called locally unbiased. 
If a region w could be found such that, v being any other region for which 

(10) P(w | 0) = @(v | 0), then o | 6) > 0(v | 6) 

for all 6 ^ 0, this would give a test which would be the best with respect to any 
alternative hypothesis. However, it has been shown by Neyman [4] that under 
certain conditions, which many probability laws satisfy, such a test will not 
exist An attempt is therefore made to control the power of the test with 
respect to hypotheses specifying values of 8 near to 0; hoping that the powers 
of the tests so obtained with respect to the other hypotheses will behave in a 
satisfactory manner. Thus Neyman and Pearson [9] define an “unbiased test 
of Type A" as a test corresponding to a critical region w such that if v be any 
other region in the sample space W for which 

(11) 0(w | 0) = fi(v | 0) = a 
and 


( 12 ) 

then 

(13) 


dp(w 1 fl) ~] = 8g(t> | 9) 1 

3 9 _ s-o 33 5-0 


d*P(w 1 8) ~1 > 3*|B(i>l8) ~ 

dd i _ 5 ->o 68 2 _ 5-0 


In the problem which I am treating the conditions 
(14) ' 0(w 10) = a i ^]„ = 0 


implied by (11) and (12) above cannot, in general, be satisfied, since the distribu¬ 
tion is discontinuous, i.e. P[E e w j H 0 } is a discontinuous function of w and, in 
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fact, for a given sample size, has only a finite number of possible values, none 
of which need be equal to a. 

However, it may be possible to find a test of II a of a typo called “unbiased 
in the limit (as M increases),” based on the limiting form of the multinomial 
distribution which is a continuous function of w. The definition [6] of a teat 
“unbiased m the limit” will be taken as follows 1 
Suppose we have a sequence (w M ) of critical regions, w M corresponding to a 
sample of size M, such that 
(i) for any M, if v M he any region for which 


(ii) 

(18) 


(15) 

1 

and 


(16) 

dp(w M | 9) 
be 

then 


(17) 

a*/9(wjr|fl)" 
30 2 _]« 


= a$(v M I o) l 
be J 

a 2 /3( v M | e)~ 


> 


ae 2 


lim 0(u>m 1 0) = a, 


(■ in) if 

(19) 

( 20 ) 


9 = Vm($ - 0 ) = VM9 


hm gg ( ^L tf) l = 0 

m->k dd Jim 


then the test associated with this sequence of critical regions is unbiased in the 
limit. 1 shall call such a test a test of type A m . 

The reason for using & as the variable in condition (19) above is that, unless 
our sequence of critical regions has been very badly or unluckily chosen, we 
shall have 


(21) lim P(wm | 9) = 1 (9 ^ 0) 

M -*•» 

while, by (18), lim f3(w M | 0) = a and so, in general, lim ^ - — will not 

Af—*0O M-*ao dd 

exist at 9 - 0. Hence we introduce termed the normalized error, and, keeping 
9 constant (and hence making 9 tend to zero) we form lim I ^ , 

In the next section will be obtained a test of Ho which is of type A„ . 


4. Derivation of Test. The composition of a sample of M experiments is 
uniquely determined by the numbers of results mi, nh, falling in the 1st, 
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2nd and 3rd categories respectively. Thus any sample may be represented by a 
point E(m) in a three-dimensional sample space W(m) with coordinate axes of 
mi, m 2 , and m 3 . It will occasionally be convenient to represent the sample 
by a point in a three-dimensional space with other axes. The following sample 
spaces will be used. 



W(m)~ 

■space with coordinate axes of mi, 


m 3 




W{d)— 

(< ti (( ff (( ^ 

di, 

da 





F(a:)— 

u tt tt u u „ 

Xi , 

X 2 , 

X 3 





W(n)— 

tt u tt it it „ 

ttl, 

n 2 , 

n 3 




where 








( 22 ) 


a. 

11 

! 

1 



(i = 

1 . 2 , 

3, 4) 

(23) 


x , = (m, - Mp^)/(Mp ai ) h 



{i = 

1 , 2 , 

3, 4) 

(24) 


n , = mi/M 



(* = 

1 , 2 , 

3,4). 


I shall use w M indifferently to denote “the critical region corresponding to 
sample size M" m any of the four sample spaces above; E indifferently to 
denote corresponding positions of the sample point in any of the four sample 
spaces' except in cases where confusion might arise, where I shall use w w (m), 
w M {d), v>m(x), w M (n) and E(m), E(d), E(x), E(n) When necessary the size of 
sample with which a point E is associated will be denoted by a subscript; e.g. E u . 

In finding a test of type 4, w shall need to consider the quantities 


dt } 3 9 2 




, where & = 6 y/M. 


The probability law of the observed values mi, m 2 , m 3 is discontinuous with 
respect to the points of the sample space W m . For if if be a point which 
corresponds to integral values ml, ml, ml oi mi, m 2 , ma ; subject to the re¬ 
strictions 


(25) 

(26) 


0 < ml (i ~ 1, 2, 3) 

0 < ^2 ml < M 
1-1 


then 

(27) 


P[E m - E? I d = Oj 


Ml Pm*Pm 8 PM* 
ml 1 ml ! m°I mjl 


where 

222 ml = M 

i-i 


(28) 
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Pai — PiPt Po2 — P i(l ~ Pt) 

(29) 

Pa 3 = P»(l - Pi) Poi = (1 - Pi)(l ~ P») 
while if P° be not such a point 

(30) P\E m ^ P°| 9} = 0 
whatever the value of 8 may be. Now 

(31) 

v> M wii! wi 2 ! ms! m*! 

where pi, pt, p», P* are as defined in (2) above, and 22 denotes a finite sum- 

u>M 

mation over all points E' in w M for which P\Em — E' \ 8} ?£ 0. Differentiating 
each side of (31) with respect to 8 , we get 


Vm\8) _ 

dd _ ui\/ 


M 1 pMpM 

mi! wtelwialwtil 


P 7ni(l — Pi — P 2 ) — injPa — m 3 Pi + riiiPiPi 

L (TTpo^p,) _ 


a s 0(u>* 

afl 2 


«1 fl) ~l _ y 

P J*-o ^ 


Ml pMfl^ 
wiilmjlwijlmj 


( 33 ) “ Pl “ Pi) ~ mPi " msPl + M?lP2 ) 2 

- (miPiPj(l - Pi - P t ) + wijP s (l - Pi - PiPt) 

+ wisPxd - P 2 - PiP 2 ) - MPiPt(l - Pj)(l - Pt))]. 

Theorem 1. The sequence of critical regions (w M ) defined, by 

(34) v + Bu 2 > A in w M v -f Bu 1 < A elsewhere, 


«« .. _ *i(PiPi)*(l - Pi - Pi) - %Pi(l - P 8 ) ! P 2 - *»Pi(l - Pi)*Pi 
( (PiPid-poa-p*)}* 

Pi(l - Pi)(2Pi - l)|n(PiPi)‘ + iaPi(l - Pi) 4 ) 
m .. _ + P*d - Pi)(2Pi - l){a!i(PiPi)* + XtP\(l - P 2 ) 4 ) 

[PiPid-POd - Pi){Pi(l-Px)d - 2P 2 ) 2 + P 2 (1 - P.)(l - 2Pi) 2 )] j 

(37 } d r MPiPid - pod - po _f 

LPid - Pi)( 1 - 2P 8 ) 2 + P,(l - P 2 )(l - 2Pi) 2 J 
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+» 


(38) 


If 

2tt i-« 


,-J U 2 


f c~* j2 dv\du = a 

Ja-b u s j 


— JVfpot 

and Xi = (jifp as defined above , is associated with a test of the hypothesis 

Ho(S = 0) which is unbiased in the limit, of type A ^ at level of significance a, 
provided that 


(39) 


0 < P t < 1 


and Pi and Pi are not both equal to 
In Lemma 1 of the Appendix (paragraph 9), put s = 2, and let 


fi — individual members of the summation for 0(wi « 10) 

t, u Bf}{w M 1 0) 1 

bb J, 


A — 


a a 


0-0 


h ~ 


9 2 (3 (wm 1 9) 


B6 2 


_ 0 = 


(*' = 1, 2) 

ft 

(see (31)) 
(see (32)) 

(see (33)). 


(40) 


From Lemma 1 we see that the regions (w) defined by 
/a > aji + 02 f 2 in w 
fa < a,fi + aifi elsewhere 
will maximize £ / 0 with respect to all regions for which 2 /i and £ A are fixed. 

It) W 10 

(a L and a 2 are arbitrary constants depending on the fixed values of S A and 

w 

zLfi). Hence any sequence of critical regions {w M ) defined by 


(41) 


(mi(l — Pi — Pi) — miPi — 7n s Pi + MPiP 2 ] 2 

- {miPiPii 1 - Pi - Pi) + m 2 P 2 (l - Pi- PJ\) 

+ OTaPi(l - Pi - PiPi) - MP 1 Pi( 1 - Pi)(l - P 3 )} 

> Oi(oti(1 — Pi — P 2 ) — mi Pi - m 3 Pi + MPiPi) + o* 

in w M) will satisfy conditions (i) given above in the definition of a test of 
type A„ . The inequality (41) may be rewritten 

{mi(l — Pi — P 2 ) — miPi — m 3 Pi + MPiPi — o 3 ) 2 

(42) - [P 2 (l - Pi) [im - MP fi - P 2 )l 

+ Pi(l - P 2 ){m 3 - MPi( 1 - Px)}] > o* 


the a/s being arbitrary constants. 

Also, by Theorem 1 of the Appendix, we have that, for any given t > 0 
and any region w, there is a number M, independent of w and such that for all 
M > M., 
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(43) 

| ft(w | 0) — I(w ) | < e 

where 


(44) 

«(*) 

and 


(45) 

3 

Xo = 2 *i(l + Po.Po/) + 2 ^ x, XjipoxPo,'^ Pol. 


t-l t<;£ 3 


We will now apply a transformation to the coordinates m\, m 2 , m 3 which will 

(a) transform inequality (42) into a simpler form, 

( b ) transform I(w) into a form to which the tables of the Normal Probability 
Integral may easily be applied for purposes of calculation. 

This transformation is 


- Pi - P*) - aiPKl - PrfP. ~ *»P|( 1 - Pi) 1 Pi 
m “ = -- p,)d - p,)ii- 


(47) v = 


(48) f = 


Pi(l ~ Pi)(2P, - UMPiPO 1 + z,P|(l - Pi)*} 

+ P,(l - P>)(2Pi - l)[*i(PxP 2 )* + ^P}(1 ~ Pa)*} 
[PiPt(l - Pa)(l - P 2 ) {Pi(l - Pi)(l ~ 2 P,y + P,( 1 - P,)(l - 2 Pi) 2 }]1 

(2P1 - i)(*,(Pipo» + x 3 pi(i - pom 

- (2P 2 - l)[*i(PiP 2 ) } + %Pkl - P,)*} 
{Pi(l - P,)(l - 2Pj) 2 + P,( 1 - P 2 )(l - 2 Pi) 2 }1 


This is a proper transformation, since under the conditions of the theorem 
0 < P, < 1 and Pi and P 2 are not both and the Jacobian 


(49) 


T _ d(u, v, t ) _ i 

d(xi,Xi, x 3 ) 04 


is non-zero and of constant sign. 
Also 


(50) x l = w 2 + v* + f. 

Hence 

(51) f!fe^ H "* , "dud,it. 

w(u,v,t) 


The inequality (42) is transformed into an inequality of form B(u — ct 6 ) 2 + v > A 
where B has the value stated above; a 6 and A being at present arbitrary 
constants. 

Therefore we may put a 5 = 0 and define A by the equation 


2w 




du = 


a 


(52) 
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and conclude that the sequence of critical regions (w M ) defined by the in¬ 
equalities 


(53) 


Su 2 + v > A in w u 
Bu 2 + v < A elsewhere 


will satisfy conditions (i) for a test of type A M . 
From (51) and (52) 




v>M 

r +• 


2t r Ja-bu* 


(54) 


By Theorem 1 of the appendix, as mentioned above, we have 
(55) | /3(^m | 0) — I(wm) | < e for all M > M t 

i.e. 


(56) | (3(w M | 0) — a 1 < e for all M > M t 
and so 

(57) /3(io M | 0) —> « as M —* <*>. 


Thus the sequence of critical regions (w M ) satisfies the condition (li) of the 
definition of a test of type . 

If w be any region defined by inequalities on u and v only (as are the regions 
w M ) then, as a special case of Theorem 1 of the Appendix, we have that for 
any t > 0 there exists a number M t such that for all M > M t 


(58) 


P„(w ) - ~ JJ <f i(u2+ ’ a) du dv 

uj(u,v) 


< « 


where P m{w ) = P\E M e w | 0}. 

By (31) and (32), noting that = y/M * L J ) t we have 

Ou OLT 


= £ Mu, v).u. (PiP,)*(1 - Pi)“Hi - p 2 ) h 

^gg) on _]<>-o vi 

= £/i(m, v)-uk 

W 

where k = (PiPd\l - PiT*(l - P 2 ) _i > 0. 

By Theorem 1 of the Appendix, as last stated above, we have 


(60) 


Mu, v) = i AtiAv.e-* ( " 4+ ’’'(l + Rm) 

A7r 
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where for convenience we have written Au, Av for A ( vj u, A (M) v the units of u 
and v when sample size is M, and R.v for Ru{u l v) which has the property that 

(61) X v)A( M )U‘A ( u)V-e~ i{,l * + ' ,2) —> 0 


uniformly with respect to w as M —» «>. 

Now let w ‘ r denote that part of w where R M > 0 and w~ that part of w where 
R u < 0. Then 


(62) 


X kufi(u, v ) 

tu + 



AuAv -iut+v*) , v jAwAv p -»<«»+.») 
—— • ue t Zj * — UK ye 

2ir w + 2ir 


Let 

(63) 


si = X 

to + 


AwAa 

! ”2iT 


uRm& 


-i(u*+»>) 


= h 



-J(u«+» 2 ) I 


!■ 


By Schwarz's inequality 


(64) 


si 

< 




5 


AuAv 

“27 


.,2 p —£(u2+U 2 ) 

U Km& 


AuAv p — H« 2 +v 2 ) 

~2 T Rm « 


1 


But 

(65) g «•/,(«, .) « g ^ .V""' 4 ’” + g 

Now ufi(u, v) > 0 and XI u 2 fi(u, v) is finite (since u is a homogeneous function 
w 

of second degree in the jr.’s and so has a finite expectation) and is bounded 

as M —* oo. Hence X w 2 f\{u, v ) is finite and bounded as M —> ». Further, 

ir 

as Af -> oo 

( 66 ) X ^ U V i(u,+ * 1) -*£-[[ u 2 e~ i(u>+ ° 3) du dv. 

27T 27T ■> •> 

Hence X w 4 B#e _ * <uJ+ ' ,,) is bounded as M —* oo. From this result, 

27T 

together with (61) and (64) it follows that Si —* 0 as M —> » uniformly with 
respect to w. Putting 

(67) 

u>“~ 27T 


it will follow in a similar manner that Sm —> 0 as M —> °° uniformly with 
respect to w. Hence 


( 68 ) 


fetS] -Eh*(«,.) 

0!7 Jd»G w 


]fc^ «-»<■**> + 5jr 
u 27T 


where S* = Si + Si and so S * — > 0 as M — > « uniformly with respect to v>. 
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Hence whatever be e > 0, there is a number ikf' such that for all M > Af« 


9/3(id | d) 
St? 


-Iff 

Jij-o 2ir J J 




whatever be the region w. In particular we may take w = w M , and then 
we have 

(70) T* II Ue ~ i(M dudv = Yir { U6 ~ iU * f B e ' U1 *} dU = ° 


and so 


t>M j t?) ~j 
3 d Jw> 


for all M>M', 


i.e., 

(72 ) lim , 0. 

jV-+oc 01 / 

Hence the sequence of critical regions (id M ) satisfies condition (iii) for a test 
of type A„ . This completes the proof of Theorem 1 
In the above theorem we have found a test which is unbiased in the limit for 
all cases except that for which P x = P 2 = The following theorem derives 
the test appropriate to this special case, and it is found that in this instance the 
test takes a very simple form. 


Theorem 2. If Pi = P 2 = the sequence of critical regions (w M ) defined by 


(73) 

| + % | > a 

in w m 

| + Xs | < a 

elsewhere 

where 

1 r +a , , 


(74) 



(75) 

m, — \M 

Xt 

It 

JO 

co 


is associated with a test of the hypothesis Ho(d = 0) of type A„ at level of 
significance a. 

The proof of this theorem follows the same lines as that of Theorem 1 as far 
as inequality (42). On putting Pi = Pa = | in (42) we get 

(— + i M — a 3 ) 2 — i(m 2 + m 3 — \M) > a 4 


(76) 
ie., 

(77) 


(x 2 + x 3 — a 9 ) 2 > a?. 
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The critical region w M defined in the statement of the theorem is of this 
form with ai = 0 and a-t = a 4 . 

Hence the sequence of critical regions (w M ) satisfies conditions {%) of the 
definition of a test of type A „ . The sequence of critical regions may also be 
shown to satisfy conditions ( ii ) and (in) for a test of type A„ by following the 
lines of the proof of Theorem 1 and noting that is + i 3 = 2M _i (m s + m 3 ~ \M) 
tends to be distributed as a unit normal deviate asM-ma 
On account of the shape of the critical regions in the general case, I shall for 
the remainder of this paper call the tests derived in the above theorem the 
parabolic tests for the cases considered. 


B. Application of the Parabolic Tests. For practical purposes the formulae 
derived above are inconvenient to use. I will therefore express them in terms 
of the deviations of the observed frequencies in the four cells from the frequen¬ 
cies “expected” when the hypothesis ff o (0 = 0) is true, i.e. in terms of the 
variables d l , where 

(78) d, = m t - Mp bi = x t (Mp 0 i) k (i = 1, 2, 3, 4). 

The test then becomes “reject the hypothesis ff 0 at level of significance a if 
v T Pu 2 > A” where 


(79) u = 

(80) v = 

(81) 

(82) 


di(l — Pi — P 2 ) — d 2 P 2 — d 3 P i 
~{MPiP 2 (l ~ Fi)(l - Pj)! 7- 

Pi( 1 ~ -PQ(2P 3 - l)(di + *) + P,( 1 - P0(2Pi - l)(di + d 2 ) 


[fPifttt -P,)(l -P,){P 1 (1-P 1 )(2P, - l) 2 + Pi(l-P 2 )(2P, -l) 2 )] 5 

hi. I 8 ' 1 "’ 


s.r_ 

LPi(l - Pi)(l - 


MPiP t (i - Pi )a - p 2 ) 


2 p 2 y + p,(i - p,)(i 


- i 


except when Pi = P 2 = §, In the latter case reject the hypothesis Hu if 


(83) 

where 


dt da 

\mh 


> o 


(84) 


1 

—j=- [ e - * 1 * 

\/2ir J-a 


dx 


1 


a. 


The application of this last case (Pi = Pi = 2 ) is straightforward, a may be 
found from the tables of the Normal Probability Integral. d 2 and d% may be 
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calculated from the data, and we may then see whether the inequality (83) is 
satisfied, and so assess our judgment of the hypothesis Ho. 

TABLE I 

Significance of Symbols 


A and B are connected by the following relation. 



Table la 

Table lb 

a s= 

0.05 

a = 1 

0.01 

p w = A — 

3 8414588 B 

P oi = A *— 1 

5.6348966 B 

B 

PM 

B 

P, 01 

ip Sllj!i 

1.6449 

0 

2.3263 


mmmmm 

1.00 

0.289 

1.25 

u 

1.25 

.231 


.212 

1.50 

.192 

1.75 

.181 

1 75 

.165 


.158 

2.00 

.144 

2.25 

.141 

2.25 

.128 


.127 

2.50 

.115 

2.75 

.116 

2.75 

.105 



3.00 

.096 

3.25 


3.25 

.089 



3.50 

.082 

3.75 


3.75 

.077 



4.00 

.072 

5 


5 

.058 

6 


6 

.048 

7 


7 

.041 

8 


8 

.036 

9 


9 

.032 



10 

.029 

15 


15 

.020 



20 

.014 



30 

.009 



40 

.007 

50 


50 

.006 


The general case is also straightforward, except for the determination of A 
from equation (81). To facilitate this I have constructed Tables la and lb. 
These tables correspond respectively to significance levels 05, .01, and from 
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them the value of A corresponding to a given value of B may be calculated. 
The quantity tabled, (p), is the difference between A and a multiple 1 (constant 
for a given level of significance and given with the table to which it applies) of 
B. To find A, therefore, B is calculated, multiplied by the appropriate con¬ 
stant, and added to the quantity in the table corresponding to B. For large 
values of B (40 and over) p is small, and A may be taken equal to the constant 
multiple of B 

In particular cases when the values of Pi and P 2 are substituted in the expres¬ 
sion for B (see Theorem 1 above) and in (79) and (80) above, these equations 
appear much less formidable. Thus in the case considered by R. A. Fisher 
[3], Pi = Pi = t and we get 



u = %M~\2di - dt- d 3 ); v = - 4(6Af)" i (2di + d 2 + d s ) 
and the test becomes "reject the hypothesis Ha at level of significance a when 


(86) 

<p — ( (2th — d-i — da) 

— f(2 di + da + 

4)}/(*(W 4 } > A 

where 




(87) 

1 ['{,->■■ 

f e~ 4 ” 2 dv 

\ du = a. 

2tT J—ac ^ 

Ja-u2vT»? , 

1 


Example, Fisher [3] gives an example of the case Pi = P 2 = *. In the 
series of experiments that he quotes the observed results fall in the four cate¬ 
gories respectively as follows: 

mi = 32; m 2 = 904; m 3 = 906; m 4 = 1997. M = 3839. 

Hence di = -207.9375; ck + = 370.375. From (86), <f> = 10863.1. B = 

37.94239. From the tables: 

at .05 level, A.os = 3 8414588 X 37.94239 + 0.0075 = 145,7615 

at ,01 level, A 0 i = 6.6348966 X 37.94239 + 0.0065 = 251.750. 

Hence we reject the hypothesis that 8 = 0, i.e. that there is no linkage, since 
the value of 4> is well outside even the .01 level of significance. 

6. Power function of the Tests. General Case. The parabolic test as de¬ 
scribed above has the desirable property that of all tests (at level of significnace 
<*) which are unbiased for large values of M this test will detect small variations 
in 8 most frequently. However, to get a clearer idea of the properties of this 

1 

‘This multiple is equal to fe* where — y= / e - ** 1 dt = 1 — a, a being the level of 

V 2ir JU„ 


significance. 
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test we shall calculate, as accurately as may be practicable, the power function 
of the test. 

As a preliminary step we obtain a rough idea of the power function by makin g 
use of the concept of a limiting power function as stated by Neyman [6]. This 
may be defined as follows: 

Let E m' denote the sample point corresponding to a sample of size M', and put 

(88) P{E U , 

where d' = M'^d, w being a fixed region. Supposing d' kept fixed, let M' increase 
and let 

(89) | d') = lim p M ’(w | d') 

Af oo 

if this limit exists. 

Then | d') is the limiting power function of the test associated with the critical 
region w. It will be noted that the limiting power function is a function of d’. 

In the problem under consideration the parabolic test when the sample size 
is M is associated with the critical region w M . Now it should be noted that 
in the definition of the limiting power function w remains fixed. Therefore 
the limiting power function of the parabolic test for sample size M is 

(90) /3“(w m | d') = lim | d). 

A/'-* oo 

The significance of the limiting power function is that for any e > 0 and for 
any d' there is a number tf.,j such that for all M > M,,» we have in our case 
(by Theorem 1 of the Appendix) 

(91) I Pm(w m | d') - | d') | < e. 

It should be noted, however, that the limiting power curve (the graph of the 
limiting power function against 0 = dM _i ) may be only a very rough approxi¬ 
mation to the actual power curve. Furthermore (Neyman, [6, p. 83]) we can¬ 
not, in general, use the limiting power function of a test to answer the question: 

“How large must we take our sample size M to detect the falsehood of the 
hypothesis H 0 (9 — 0) when actually 9 = 9' , with a limiting probability of at 
least, say, 0.95?” 

For if we form a table as below 

M d\u) — M^9' f } x (wm | d( M )) 

100 

1000 

it is possible that /9 „(mjj, | d{ M) ) may never attain the value 0.95. 
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Theorem 3 The limiting power function of the 'parabolic test is 

(92) *.(»„ I if) = ± r r e -w a 

^7T J — so ^ jA—Bu* J 

m all cases for which 0 < P, < 1 and Pi and P 2 are not both equal to 

The proof of this theorem, follows immediately from Theorem 1 of the Ap¬ 
pendix by applying the transformation (46)-(48) and putting X = PiP 2 
The above remarks concerning special precautions to be taken with respect 
to the limiting power function suggest the necessity of studying the actual 
power function of the parabolic test by some other method 

With this object in view, a study was made of the distribution of the function 
4> = v + Bu for finite values of M and in particular for M ~ 100 and M = 3839. 
<t> is a discontinuous variate and, for any given value of M, has definite limits 
of variation arising from the limitations on the values of the variables m, stated 
in the inequalities (25), (26) above These limits of variation of ef> were found 
to be 

(93) - mM)\m - T V) < 4> < 1(1 M)*lM(iM - 1) 


for the case Pi = P 2 = 1. Hence when 

M = 100, -12.25 < <t> < 5486 86, 

M = 3839, -75.89 < 0 < 1310795.75. 

Also it was found that 


(94) <§(<*> | 0) = B- 


1 + 


(1 — 2Pi)(l — 2P 2 ) t j ^ , (M-DPxPt 
(i - PO (l - Pi) {e 1} f (l - Pi) (l - P,) 


(a 9 -!) 2 ] 


where &(<l> j 0) denotes the expected value of </>, given the value of the parameter 
6 Thus when Pi = P 2 = i we have B = \/f M and so &(4> | 0) = \/ %M. 
Hence when 


M = 100, Sfo | 0) = 6.12372, 

M = 3839, &(<t> | 0) = 37.94239. 


It is thus seen that the distribution of <j> might be represented by a Type III 
curve, since the distribution of <f> has a finite lower bound and a very long 
positive tail. In order to fit a Type III curve, we must know the second moment 
of the curve as well as its lower bound and mean. The general expression for 
the second moment about zero is too complicated to be printed and so only the 
numerical expressions obtained by giving special values to M are given below. 
These are: 


(i) M = 100 

&(4? | 6) = 112 41667 + 165 62963(e s - 1) + 2493.33333(e s - l) 2 
+ 1078.00000(e 9 - l) 3 + 4356.91667(e" - l) 4 , 


(95) 
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(it) M = 3839 

S(4 > 2 1 6) = 4318.79213 + 6397.29625(e* - 1) + 3684321.24073 (e° - l) 2 

(96) + 1636267.33255(e 9 - l) 3 + 261530062.11111(e 9 - l) 4 . 

Using the above results Type III curves were fitted to the distribution of <f>, 
and approximate values of the power functions $(w M | 9) l at level of significance 
05, were calculated. This was obtained by evaluating P{<£ > A ah \0\ and 
assuming the distribution of 4> to be that given by the fitted curve. Then 

(97) $(wm | 8) = P[<t> > A 05 | 6\. 

The values obtained for the limiting and approximate power functions are 
given in Tables Ila, lib. Unfortunately the agreement between the two is 
not satisfactory. 

Special Case. For the cases Pi = Pi = 5 (M = 100, M = 400) power 
functions were calculated on the assumption that for a given value of 8, the 
random variable 2 M~\ck + d 3 ) is distributed normally about a mean M\e — 1) 
with standard deviation \/e e (2 — e 6 ). This is approximately the case for the 
values of M considered. The approximate power functions so calculated are 
given in Tables Ilia, IUb 


7. Parabolic Test and x 2 Test. It is interesting to note the close connection 
between the parabolic test and the x 2 test as introduced for intuitive reasons 
and normally used in testing for linkage The x z test consists of calculating 
the quantity 


(98) 


.2 

X 


1 

MPiP^l - Pi)(l - P 2 ) 


{(1 - Pj)(l - Pi)mi 


— Pi{ 1 — Pi)rrii — Pi(l — Pi)nk + P1P2TO4) 2 


and rejecting the hypothesis Ho (9 = 0) if | x | > a where 

In the special case (Pi = Pi — \) the parabolic test and the x 2 test are iden¬ 
tical; while comparing (98) and (79) we see that in the general case 


(100) u = x. 

Hence in the general case the criterion used in the parabolic test may be 
written 


(101) 4, = v + B X \ 

(1) Large Samples. For large samples the first term of the expression v + 
Bx is usually of small importance, since 
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v is of form M~^ X (linear function of the d,’s), while 
By? is of form Af _i X (quadratic function of the d.’s) 

For such samples the x test and parabolic test would appear to be nearly 
equivalent. 


TABLE II 

Limiting and Approximate Power Functions of Parabolic Test 

Pi - P* = i 

- * < e < 1.386 


Table I la 
M = 100 


e 

Power 

Limiting 

Approximate 

-2.00 


0.90870 

-1.50 

0.99880 


-1.40 


0.77656 

-1.20 

0.97915 

0.69505 

-1.05 

0.93786 


-1.00 


0.58580 

-0.90 

0.85024 


-0.75 

0.70467 

0.42756 

-0.60 

0.51532 


-0.45 

0.32258 

0.21849 

-0.30 

0.16986 

0.12504 

-0.15 

0.07905 

0.05689 

-0.10 

0.06280 

0.04438 

-0.05 

0.05318 

0.03866 

0.00 

0.05000 

0.04069 

0.05 

0.05318 

0.05021 

0.10 

0.06280 

0.07429 

0.16 

0.07905 


0.30 

0.16986 

0.26559 

0.45 

0.32258 


0 60 

0.51532 

0.75854 

0.75 

0.70467 

0.94245 


Table lib 
M = 3839 


0 

Power 

Limiting 

Approximate 

-0.25 

0.99932 

0.99853 

-0.20 

0.98502 

0.97521 

-0.15 

0.87243 

0.83620 

-0.10 

0.54197 

0.52066 

-0.05 

0.17827 

0.19223 

0.00 

0.05000 

0.04111 

0.05 

0.17827 

0.21568 

0.10 

0.54197 

0.59517 

0.15 

0.87243 

0.91641 

0.20 

0.98502 

0.99640 

0.25 

0.99932 

0.99999 


Theorem 4. The limiting power function of the x 2 test is 

( 102 ) fS K (w x ' I *) = 1 - -L f +a du 

V2tt J-o 

(w x j denotes the region defined by the inequality | x I > a). 

This theorem may be proved by applying (46)^(48) to Qo{xi, x%, xf) in 
Theorem 1 of the Appendix, and noting that u = x by (100), 
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We notice that | d), for a given value of d, has the same value for all 

values of M, unlike the limiting power function pJw M | d) of the parabolic 
test It is this point which accounts for the seeming paradox that, despite the 
manner in which the parabolic test was defined, for all values of d and M 

(103) | d) > p m (w M 11?) 

as may be deduced from (92) and (102). This does not mean that for any 
given d and all M sufficiently large the power function of the x 2 test, Pu(w x t | d), 

TABLE III 

Approximate Power Function 

Pi = = i 

- ® < 6 < 0.693 

Table Ilia. Table Illb. 


M 

= 100 

M 

= 400 

6 

Power 

e 

Power 

-0.45 

0.96288 

-0.25 

0.99424 

-0.40 

0.92161 

-0.20 

0.95482 

-0.35 

0.85072 

-0.15 

0.79787 

-0.30 

0.74351 

-0.10 

0.47734 

-0.25 

0.60197 

-0.05 

0.16378 

-0.20 

0.44054 

-0.02 

0.06810 

-0.15 

0.28380 

0.00 

0.05000 

-0.10 

0.15727 

0.02 

0.06885 

-0.05 

0.07737 

0.05 

0.17609 

0.00 

0.05000 

0.10 

0.55737 

0.05 

0.08029 

0.15 

0.90213 

0.10 

0.18177 

0.20 

0.99431 

0.15 

0.36464 

0.25 

0.99995 

0.20 

0.60278 



0.25 

0.82071 



0.30 

0.94975 



0.35 

0.99299 




is necessarily not less than the power function of the parabolic test, (3 M (w w | d). 
For although, given any e > 0, there is a number M,,# such that if M > M t ,a 

(104) | /3 m (w x i | d) - P m {w xi | d) | < e 

and 

(105) -0.(u*|0)| < « 
it may be that for such values of 'M t ,t 

(106) 0 < P*(w x i | d) - p K (w M | d) < 2e. 
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The above results show, however, how close the agreement between the power 
functions of the two tests is for large values 6f M In fact wo have 

(107) lim /S M (w>v 11?) = Pa,(w x t | d). 

M —► » 

This may be easily proved, since as M increases w M approximates to w x t 
(2) Small Samples. In order to obtain some idea of the relations between 
the two tests when M is small (i e. less than 100), the case Pi = Pi = \, M - 32 
was considered in some detail. 

In this case our tests at 5% level of significance are respectively 
X test, reject if 

(108) \2y - z\> 8.315 
parabolic test, reject if 

(109) (2 y - z) 8 - |(2 y + s) > 69.576 
where 

(110) y - di 2 = d* + . 

All samples for which the verdicts of the two above tests would not agree 
were obtained. These were as follows: 

(a) Samples for which Ih is accepted by x 2 test, rejected by parabolic test 

Probability of drawing sample of this type 
when Ho is true is 0.00320. 

( b) Samples for which H 0 is rejected by parabolic test, accepted by x 2 test 

y = 0 1 2 3 5678899 Probability of drawing sample 

-— of this type when Ho is true is 

z = 9 11 13 15 1 3 5 6 7 8 9 0,00038. 

Thus the probability of the two tests giving different verdicts when Ho is in 
fact true is only 0.00358 

It will be noted that the above results imply that 

(111) fa(w n | 0) - B 32 (w x , ] 0) = 0 00320 - 0.00038 = 0.00282; 

i.e. that the true levels of significance of the two tests are not equal. This is 
to be expected, because of the discontinuity of the probability distribution of 
sample points, which makes it unlikely that the level of significance of either 
test is exactly .05. 

Similarly we can obtain values of £ 12 ( 1032 1 9) — (hi(w x i | 6), the differencesin 
the powers of the two tests with respect to various alternative hypotheses. 
These values were obtained for a few values of 8. 


y = 1 

1 

0 -1 

-2 

z = 

— 6 

1 

00 

1 

h— 1 

O 

-12 
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8 $32( 11)32 j 0) — Pii(w x 3 | 0) 

-0.5 0.01625 

0.0 0.00282 

0.5 - 0.00006 

These figures indicate that the parabolic test detects negative 0’s better than 
the x 2 test, but that the x 2 test detects positive @'s better than the parabolic 
test, although the advantage in this latter case is minute. 

The critical regions associated with the two tests may be represented by 
regions in the (y, z ) plane The critical region for the parabolic test will be 
defined by 


( 112 ) 


(2 y - z) 2 - |(2 y + z) > v 


and that for the x 2 test, w x i , by 

(113) (2 y - z) 2 > / 

where v — v'. 

w x i is therefore the complement of the region lying between the lines Li, L 2 
with equations 2y — z — ±\Zv')w M lies outside the parabola K with equation 
(2 y - z) 2 - 4(2 y + z) = v. 

Since v ** v', K meets Li , L t at points near the respective intersections of 
Iji , Li with the line 2y + z = 0. See Figure 1. 

In the diagram the regions Fi, Vi contain all sample points for which the 
X 2 test rejects and the parabolic test accepts H 0 ; Ui, Ui contain all sample 
points for which the x 2 test accepts and the parabolic test rejects Ha . 

For a given value of 0 it is known that the probability distribution is approxi¬ 
mately such that the quantity 


(114) 


.2 _ {y- AK/ - l)) 
-hM + AW - 1) 


2 + {z + me - l)j 2 


t \M - - 1) 


+ 


(y + z + ^M(e ! - l)} 2 
AM + *M(e 9 - 1) 

is distributed as x 2 with 2 degrees of freedom. 

The ellipses of equal density $ = constant have centers at points (AM[e - 1], 
— }M[e — 1]) which must lie on the line 2 y 4- 2 = 0 When 0 = 0 the center 
is at the origin, and the major and minor axes of the ellipse make angles of 
approximately 99.5° and 9.5° respectively with the y-axis. For small changes 
in 0 the angles of inclination of the major and minor axes of the ellipse to the 
coordinate axes are not greatly changed, and we see that as the center of the 
ellipse moves along the line 2y + z = 0 we have 
(1) 8 increasing', center moves downwards, tending to increase P\E e t/ 2 ) — 
{E t Ysj while P\E tVi] and P[E t fh) both become small. Thus /3m(w m | 0) 
tends to increase quicker than /3 u(w x i j 0) 
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(S) 6 decreasing: here we have the opposite effect and Ph(w m | 9) tends to 
increase slower than /3 m(w x i \ 6). 

These conclusions agree qualitatively with those drawn in the case M = 32. 
(N.B. In the case 1$ = 32 no sample points fall into the region Ui because no 
points in U i satisfy the inequalities (25), (26)). 

8. Some Geometrical Considerations. In this section we shall consider the 
manner in which the situations dealt with above may be interpreted in terms 


z 



of geometrical concepts. It will be convenient to consider as variables n, — 
m,/M, The sample space W(n) is then bounded by the four planes 

0 (» == I, 2, 3), 

1 . 

In this space, corresponding to any admissible hypothesis He specifying a 
value of fl, there is a point T) with coordinates (0”‘, 0" 2 , 6 n ’) where 

0" 1 = P.P.e, 

0" 2 = Pi( 1 - P 2 e s ), 

e ni = p s (i - p/). 


(115) 


n x = 


a 

t-1 


( 116 ) 
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These are the proportions of results expected in the first three cells, if the 
hypothesis Ho specifying 8 be true 
Now, if Ho be true, we have 

(117) P[n x = n[, n 2 = n 2 , n 3 = n 3 , n 4 = ni | H e ] ^ ce -ix » 
where c is constant for a fixed sample size M, and 


(118) 


X0 

M 


E 


(n' t - 6 n ') 2 

e n ' 



i - E r 

i-l 


Hence the most frequent position (s) of the sample point E will be some- 
wheie near the point To, which I shall therefore call the center of density It 
will be noticed that, whatever be the value of 9, the point T e must lie on the line 

(119) - PtPi = -ha - Pi(l - Pa)] = -[«, - P 2 (l - Pj)]. 

This line, a segment of which is the locus of the center of density for our set of 
admissible hypotheses, will be called the line of density. 

In this space the parabolic test corresponds to a critical region comprising the 
exterior of a parabolic cylinder The equation of the boundary of this critical 
region at level of significance .05 was found for the case Pi = P 2 = |, and a 
model made of it. Also included in the model were the ellipsoids 

(120) x* = K, ob 
where Km is a constant so chosen that 

( 121 ) P{xe > K.os l 9} - .05 

corresponding to 

(i) the case when H 0 is true 

(ii) the cases when 

(122) (a) pi = -fa, p 2 — Pi — th’, Pi = it he. 9 — 0.41 

(123) (b) pi = p 2 = pz = 15 V; Pi = T 5 be. 6 = —0.69. 

It was found that in the case Pi = P 2 = I one axis of all the xs-ellipsoids 
was perpendicular to the plane through the line of density and the axis of n,. 
The generators of the boundary of the parabolic acceptance region are also 
perpendicular to this plane. (By “acceptance region” is meant the complement 
of the critical region. The acceptance region may be written symbolically 
w m .) There were further added to the model the intersections with this plane 
of the ellipsoids at probability level 01 , corresponding to the three hypotheses 
consideied above (6 = 0, 0.41, —0.69) and two others, viz. 

(124) pi - -g%, p 2 = p 3 = -rz> Pi — ti he. 8 = 0.92, 

( 125 ) P i = A; pi = = W; pa = M he. e = -1.39. 
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For convenience in making the model to a simple scale (1 unit = 150 cms.) it 
was found necessary to take the sample size M as 1312.5. The model is shown 
in Figure 2 It will he seen that the acceptance region for the parabolic test 
is approximately enclosed between two parallel planes perpendicular to the 
plane common to the line of density and the axis of n i. These two planes, in 
fact, enclose the acceptance region for the x 2 test. The vertex of the normal 



Fig. 2 

parabolic section of the parabolic acceptance region is at a comparatively great 
distance “below” the plane rii = 0 . 

As an interesting digression we may use our model to compare qualitatively 
the parabolic test with yet a third possible test of H 0 . This test is to reject 
Ho at level of significance .05 if 

(126) xl>K 05 

and may be called the xo test. The xo-ellipsoid shown in the model is the ac¬ 
ceptance region for this test. It will be noticed that when 6^0 the ellipsoids 
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of equal density include somewhat more of the acceptance legion of the xo test 
than of the parabolic acceptance region. This means that the xo test would 
detect that the hypothesis H o(0 = 0) is false in these cases, less frequently than 
would the parabolic and x tests. We also notice that the center of density 
To leaves the parabolic acceptance region before it leaves the acceptance region 
of the xo test as it moves along the line of density from the point where 6 = 0, 
whether the direction of motion of T s corresponds to 6 increasng or decreasing. 
This also indicates that the xo test would act less efficiently than the other 
two tests. 


9. Appendix. In this appendix are obtained various results which, while 
essential to the main argument, would appear as digressions if they were inter¬ 
polated as required The numbering of equations in this appendix does not 
continue from that of the previous sections, but forms a separate group. 

Lemma. If /o(m), fi(m), . , f,(m) be (s + 1) functions of the k variables 

■m-i , m- 2 , ■ , nik which are zero except for a finite number of sets of integral values 

of mi, ■ ■ • ,mk‘, and if w a be a region in the space of m’s such that 

* 

(1) /o(m) > 53 in wo 

t-i 

(2) f 0 (m) < 53 <ufx(m) in E) 0 

i-i 


ai, a*, 

(3) 

we shall have 


, a k being arbitrary constants; then if w be any region such that 

£ /.(m) = £ f y (m) (i - 1,. • •, s), 


(4) 

Proof. Let 


2/oOn) < £/o(m). 

id u>0 


(5) 


5=2 /o(m) - 2 /o(m) 

W 0 U) 

= £ fo(m) - 2 /«(*») 

w q— to tn o vi—wuio 


where ww 0 denotes the common part of w and w 0 . 

Hence the region w — ww 0 , consisting of those points of w which are not in 
wvo, and so not in w 0 , is contained in W 0 . Similarly the region w 0 — u>w 0 is 
contained in w 0 . Hence, by inequalities (1), 

5 > 2 12 a,f(m)\ - 2 (SoJ.W 
u>o —J w—vnoo L 1 "* 

5 > 2 12 a,f t (m)} - 2 {2 4w). 

Wq t <-1 ) W \ <_l > 


( 6 ) 

and so 
(7) 
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Since the total number of terms in each double summation is finite, we have 

(8) 8 > Z fl * {Z/.M - Z/«(™)j ■ 

V“1 1TQ V> 

But 

( 9 ) Z/.(w») = Z /.(w), (i = 1, ... , s). 

tUQ Ui 

Hence 

5 > 0, and Z/oM < Z/o(m). 

u nig 

A lemma similar to the lemma above, where the f’s are taken to be integrable 
functions and summation over the regions w, w a is replaced by integration over 
these regions, is given by Noyman and Pearson [ 9 ] The proof given above 
follows the lines of the proof given in that paper, 

Theorem 1. Suppose that , in a quadrinomial population: 

(i) the cell probabilities are dependent on the number M of trials made, and are 


given by 

Pi = Poi + <PM 

(10) 

Pi = Pot ~ <Pm 

Pa = P 03 ~ <Pm 

Pi = Pm + <Pm 

where 


(ID 

4 4 

Z P(H = Z Pi = i 

i-l 4-1 

and 


(12) 

<Pn — \(e iM — 1) 


(ii) 

(13) *. = (m { - Mp 0 ,)/(Mp 0 ,) 4 (i = 1, 2, 3, 4) 

where mi = number of results falling in i-th cell, 

(in) w(x), or briefly w, is a region in the space W of xi , x 2 , xi ; and Pm(w) 
is the integral probability law of w corresponding to the values pi, pi , pi , p< of 
the cell probabilities given in (2) above when we have M independent trials. 

Then 


Pu(w ) 


1 

(2r)* poi 


/// 




(14) 


dx i dx 2 dxn 
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uniformly over W os M «, where 

3 

Qtfo 1, ^2, 2 3 ) = E ^(1 + PO.P 04 ) + 2p^ E ^'^(PO.PO;) 5 

i=l t<;£3 

(15) - 2X?9 {^(poi 1 - pmpol) - X2(po2 + PcuPol) 

4 

“ 23(5903 + P 03 P 04 1 )) + xV E po» 1> 

i“l 

This theorem may be proved by the same method as that used by F. N. 
David [2] in proving the generalized theorem of Laplace 

I would like to thank Professor Neyman for his invaluable suggestions and 
advice in the preparation of this paper. 
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REDUCTION OF A CERTAIN CLASS OF COMPOSITE 
STATISTICAL HYPOTHESES 

By George W. Brown 

1. Introduction. A situation frequently met in sampling theory is the fol¬ 
lowing: x has distribution f(x, 0), where 0 is an unknown parameter, and for 
samples (xi, ■ ■ ■ , x„) there exists in the sample space E n a family of (n — 1)- 
dimensional manifolds upon each of which the distribution is independent of 
0; in addition there is a residual one-dimensional manifold available for estimat¬ 
ing 0. For example, suppose there exists a sufficient statistic T for 0, then on 
the manifolds T = Ta there is defined an induced distribution which is inde¬ 
pendent of the parameter. 

A similar situation is observed when 0 is a “location” or “scale” parameter. 
Let x have the distribution f{x - a ) for some a, then the set (x 2 — xi, x 3 - 
£i, • ■, x n - xi), or any equivalent set, such as (xj — x, ■ • • , x n — x), have a 
joint distribution independent of a, and there is a residual distribution corre¬ 
sponding to each particular configuration (xi — £i, • • • , x n — Xi). Fisher 
[1] and Pitman [5] have examined the residual distributions in connection with 
the problem of estimating scale and location parameters. In this paper we 
shall be concerned primarily, not with the residual distribution, but with the 
remainder of the sample information, corresponding to the (n — l)-dimensional 
distribution which is independent of the parameter. It is found, in a rather 
broad class of distributions, that the part of the sample not used for estimation 
determines, except for the parameter value, the original functional form of the 
distribution of x, 

This paper is devoted mainly to a study of particular classes of distributions 
having the property mentioned above. We consider also the theoretical appli¬ 
cation of this property to certain types of composite hypotheses which may be 
reduced thereby to equivalent simple hypotheses. 1 The principal results of this 
nature may be Bummed up as follows: If x has distribution of the form f(x, 6), 
where 0 is either a location or scale parameter, or a vector denoting both, then 
there exists, in samples (xi, ■ ■ • , x n ) a set of functions y t {xi , ■ • * , x n ), i ~ 
1,2 , ■ ■■ , p, p < n, having joint distribution D(y x , • ■ ■ , y p ) independent of 0, 
and such that the converse statement holds, namely, if {y<} have the distribution 
D(yi, • ■ , y P ), them x has, for some 6, a distribution of the form fix , 0). There 
is a corresponding statement when x has a distribution of the form f{x — 2a t -M,), 
where the (a,] are parameters, and the («<{ are regression variables., 

1 We use the terms simple and composite hypotheses in the BenBe of Neyman and 
Pearson [2], 


m 
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2. Location and Scale. This section is devoted to the study of functions of 
the sample observations which are such that their distributions determine the 
distribution of x, except possibly for location and scale 
It will be assumed that associated with x there is a function F(x) such that 

(a) F(x) is monotone non-decreasing, 

( b ) F(-«>) - lim F{x) = 0, and (c) E(») = lim F(x) = 1 

2-* —OO I-+QC 

with the normalization Fix) upper semi-continuous. F(x) is the probability 
that the random variate takes a value less than or equal to x. If F(x) is as¬ 
sociated with the random variate x we say that x has the distribution F(x). 
If g(x) is a Borel-measurable function, the Lebesgue-Sticltjes integral 

f g(x) dF(x) is denoted by E[g(x)}, 

00 


The characteristic function <p(t) = E(e Uz ) determines F(x), that is, if 
f e' lx dG(z) = f e tlx dF(x), then F(x) = G(x). 

J—M J-00 

Similarly, let F{x x , ■ ■ ■ , **) be such that 

(o) F(x i, • • • , x,_i, x, + h, x t +i , • , xt) > F(x i, ■ • • , x,, ■ , x k ) for 

h > 0 and i = 1, 2, • • , k; 

(b) lim F(x i, • • • , Xh) = 0, i = 1, 2, .. • , A; 

X t ~*-oo 

(c) lim F(xi, •••, xi) = 1; 

Si." 

with the normalization F(x v , • • , xfj continuous on the right in each x x . If 
F(x i, • • , is associated with %y , • • , Xk we say that x \, ■ , x k have the 

joint distribution F(x i, • , Xk). As before, E[H(x x , • , % k )\ = / H dF, 

Jst 

where 14 is the Euclidean &-space. It is well known that under such condi¬ 
tions, given Borel-measurable functions yfxi , ■ • ■ , *«,), i = 1, • - ■ , p, p < k, 

then G{y u • • ■, yf) = I dFixy, , x k ), where R{y) is the region [y x {x i, • ■ , 
•'«(!/) 

x k ) <y lt , y P (xx , • , Xu) < y r ], is again a distribution function satisfying 

the conditions above. Moreover, / g{y x , • , y P ) dG{y x , ■ ■ • , y v ) = 

Jr 


/ g[y i(*i, • ■ , Xfc), ■ , i/p(xi, • • , x k )} dF, where R' is the set of all points 

Jr' 

(xi , • ■ , x h ) such that {ydx \, • , x k ), ■ , y P (x i, ■ • , x k )} t R. 

If x has distribution F(x), then, by definition, the set (xi, , x n ) is a sample 

from this distribution if Xi, • • , x n have the joint distribution F(x i) • • F(x n ) 
The following theorem states that two distributions giving rise, in sampling, 
to the same distribution of the set x\ — x„, x 2 — £„,■••, x«-i — x n , with 
n > 3, can differ at most by a translation, that is, the distribution of that set 
determines the original distribution except for location. 

Theorem Ia: Let x have the distribution F(x). Denote by S the set of zeros of 
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J c tx dF(x) and denote by t the g.l.b. of | i j for t m S. Suppose that the comple¬ 
ment of S is e-connected. 2 Suppose that %' Has distribution G(x'), and let Xi, 
and x[ , ■ ■ , x' n be samples. Then the set w a = x a — x n , a = 1, .. , n — 1, 

have the same joint distribution as the set w a = x a — x n if and only if there exists 
a constant a such that x f + a and x have the same distribution. 

Pboof: The sufficiency of the condition follows immediately, since w' a = 
x' a - x' n = (x' a + a) - (x'„ + a). 

In establishing necessity, only the fact that Wi, wz have the same joint dis¬ 
tribution as w[ , w'i is needed. This hypothesis implies that 

that is, 

Set ip{t) = E(e' lx ), = E(e' 1 *'). The relation above becomes 
(1) k - U) = Hk)Hk)H~ k ~ *»). 

Consider equation (1) for values of k , U in the neighborhood of t = 0. ip( 0) = 
i//(0) = 1, hence there is an interval | i j < 5, in which ip(t) and do not 
vanish. It is easily shown that <p(t) and \f/(t) are each continuous, since e l< * in 
the neighborhood of f. = 0, is continuous uniformly for any bounded interval 
of x, and since A may be chosen-so that 1 — F(A) and F(—A) are both as small 
as desired, In the in terval 1 1 1 < 5 the f unction f(t) = <p(t)/\p(t) is continuous. 
Also, <p( — t) = y(f) and f(—f) = f(t). Setting t 2 = 0 in (I) we obtain 
<fi(t)<fi(—i) = ^(f)V'(-0 1 hence | <p(t) \ — | |, that is, \f{l) \ — 1. /(f) takes 

values on the unit circle of the complex plane, and /(0) = 1, hence there is an 
interval j l j < S' such that z = /(f) lies on an arc y, of length less than 2ir, 
containing the point z = 1. Now consider the functional equation (1) for 
| <i | < | 4 | < 4fi'. (1) becomes 

mmf(- k - k) = i. 

The interval | f | < S' was so chosen that for | k | < 2<5', | k | < $5', it is possible 
to define a single-valued branch of the argument of f{k), f(k )> and f(k + tj). 
Letting fj = Owe have/(f)/( —f) = 1, hence, replacing/(— k — tf) by l/f(k + 
tf) in the last equation, we have 

f(.k)f(k) = /(fi + k). 

Arg/(<i), arg f(k), and arg f(k + f») are uniquely determined, except for some 
fixed multiple of 2ir. If we choose the principal value of the argument, i.e., so 

* The set S is e-connected if any two points p, g, in S oan be connected by an e-cham, 

1. e., there exists a set p 0 = p, pi, - , pn -1 , p n = g, such that | p; — p,_i | < «, t = 1, 

2. ■ ■ , n. 
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that 0 < arg fit) < 2ir, we must have 

arg/Oi) + arg f(t 2 ) = arg f(ti + k) 


for | h t < §6', \k \ < iS'. Since arg f(t) is continuous, any solution of this well 
known functional equation must be of the form arg fit) = at. j fit) J == 1, 
therefore there exists a constant a such that f(t) = e’“‘, for ] t \ < that is, 
<p(t) = for | f | < \S'■ By use of (1) this may be extended to hold for 

all t such that | f | < «, where t is the minimum modulus of all t such that y(t) = 0. 
(1) may now be used to extend the relation for all t such that ip(t) ^ 0 by choos¬ 
ing an e-chain connecting the origin to the point t. We know 'already that 
<p(t) = e at 4/(t) if <p(t) = 0, hence it holds for all t. This relation says that 
Eifi tx ) — E{e' l(x +a) ), hence x! + a and x have the same distribution, thus 
completing the demonstration of the theorem. 

It should be remarked that the set (% — *„,• • , a„_i — x n ) may be replaced 
in Theorem la by any equivalent set, for example, (pi — £,••• , £ n -i — x). 

The next result is of the same natuie as Theorem la except for the replace¬ 
ment of the location parameter by a scale (positive or negative) parameter. 


f oo 

gitCloe |x|) 

00 


are nowhere dense, and let x! have distribution G(x'). Let xi, • , x n and 

x [, • • ■ , x' n be samples from the distributions of x and x', with n > 3, then the set 
w a = x a /x n , a = 1 , • • ■ , n — 1 , have the same distribution as the set w' a — 
x' a /x'n if and only if there exists a constant c such that ex' and x have the same 
distribution. 

Peoof: The sufficiency of the condition is evident. Suppose, then, as before, 
that wi, w 2 have the same joint distribution as w[ ,w 2 . Log | wi | and log | w 2 1 
have the same joint distribution as log I w[ | and log \w 2 \, hence by application 
of Theorem la to log | x | and log j a;' j it follows (since the complement of a 
nowhere dense set is e-connected for every «) that there exists a constant a such 


that 


£ 


iMok |x( 


dF(x) 


-£ 


it [log lx'l-a] 


dG{ x). 


Let y = e~ a x', then | x \ and | y | have the same distribution, and 
(2) J e u log 111 dF(x) = / e“ lo ‘ " ,l dH(y), 


where y has distribution H(y). We now have to show that either y or — y has 
the distribution of x, that is, it must be shown that either H(y) = F(y), or 
H(y) = 1 - F(-y). 

By the first part of the theorem the functions Ui = yi/ys and Us = Vi/ys have 
the same joint distribution as W \, w 2 . It is clear that the mean value of any 
function of Wi and u 2 is the same as the mean value of the corresponding func- 
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tion. of Wi and w 2 . Hence 


/// sgn <A sgn w,dF(xi) dFh,, 


- Ill W «■ sgn «. dH(yi) dBW « Wi 

— 00 

where sgn 3 = 1 , for a; > 0 , sgn * = -1 for 3 < 0 . 

(sgn Mi) (sgn Wi) = (sgn a*) (sgn x t ), 
so that the last equation becomes 


/// 


jt'Iil Uo« I11I- log |ij |)+( 2 (log |* 2 |- log l*j|)J 


sgn X\ sgn x 2 dF(x,) dF(x 2 ) dF(x s ) 


(3) 



1*1 (loglvil-log |yj|)+ij (log |yj|- log Id,ID 


Sgn 2 /x 


Set 


X sgn j /2 dH( Vl ) dH{ Vi ) dH(y a ). 


h (t) = j e'‘ 101 , * 1 dF(x); w (0 = j e'^dHiy) 

<fo(t) = / «“ 10,1111 sgn xdF(x); n {t) = J egnydH(y). 


From (3) we have h(k)MkW~ k - t 2 ) = ^(t^t *)*(- - d 2 ) for all 
< 1 , < 2 , and from ( 2 ) we have fi(d) = Vl (t) for all t, hence, if ft(- - < 2 ) ^ 0 , 
Mk)h(k) = <P 2 (k)<P 2 (k)- By hypothesis the zeros of \pi(t) are nowhere dense’ 
hence if &(— fi - fa) = 0 there is a sequence f !n) 3 such that t M — h - t[ 
and fa(t ) jt 0 . Now take an arbitrary sequence t[ n) such that t[ , ' ) —> t 2 , 
then Jj" =^- t n) - ti n> must tend to U . For each n we have h(t[ n> )h(ti n) ) = 
(? 2 (h ), All the functions appearing are continuous, thus we see that 

h(k)Mk) = MM for all f, , t,. From this it follows directly that either 
W) = nd) for all t or ^(d) = - w (f) for all t We have 3 


M) = j( e ’ ,l0iI dF(3) + e' lUt{ ~ x) dF(x) 
HO = j[ e' ll °"dF(x) - J_° e' il °^-* } dF(x) 


The assumption has been made implicitly that F(i) and ff(*) are continuous at « - 0, 
othemse the distribution of as./*- is not properly defined, and the functions <p.(Z) and *.(t) 
are then not defined, Similar assumptions will be made whenever necessary in later 
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«(t) = [ e tl0SI dH(x) + f° dH(x) 

JQ J—oo 

and vi(t) = f e ulogx dH(x) - [° e' 1 ]ob( ~ z) dH(x). 

‘'O J— oo 

Combining these expressions with the relations obtained above leads, by Fourier 
inversion, to the result that either F(x) = FI(x) or H(x) si — F(-x). We 
have shown that either y or —y has the same distribution as x, that is, either 
e _ V or — e _ V has the same distribution as x. 

Theorem lb states essentially that the joint distribution of the set x a /x n , 
a = 1, • ■ , n — 1, determines the distribution of x except for a scale parameter 
and possibly a reflection. In the event that x has an asymmetrical distribution, 
and if it is desired to rule out negative changes of scale, a variation of this pro¬ 
cedure is necessary. The next result is appropriate for this situation. 

Theorem Ic Let x have distribution F{x) such that the zeros of J e' 1108,11 dF(x') 

are nowhere dense, and let x' have distribution G(x'). Let Xi, • , x„ and 

x[, , x n be samples from the distributions of x and x', with n > 3. Express 

X\, • , x n and x[ , ■ ■ ■ , x' n in spherical coordinates 

Xi = r cos 0i, x[ = r' cos d[ 

Xi = r sin 6\ cos 0 2 , x 2 = r' sin o[ cos d' 2 

• • 

■ 4 

x n = r sin 0i sin 0 2 • • • sin 0«_i, x' n = r' sm 0i sin 0 2 ■ • sin o' n _ x . 

Then 8 X , ■ • • , 8 n -i have the same joint distribution as d[ , ■ ■ ■ , d' n _i if and only 
if there exists a positive constant k such that kx' and x have the same distribution. 

Proof: Sufficiency of the condition is an immediate consequence of the fact 
that 0i, • • , 0 n _i are invariant under the transformation x = kx', with k > 0. 
If 0i, • ■ ■ , 0„_i have the same joint distribution as d[, • • , 0n-i then the set 
\xjx n ) have the same joint distribution as the set {x’ a /x' n \, hence, by Theorem 
lb, there exists a constant c such that ex' has the same distnbution as x. To 
establish necessity of the condition we must show that | c | x' has the same 
distribution as x. 

Set y = | c | x', and let y x , ■ ■ ■ , y„ be expressed in spherical coordinates; 
yi, ■ , y n have the same angular coordinates d[, ■ • , 8' n - 1 . This implies 
that xi/r and x 2 /r have the same joint distribution as yi/R and yi/R, where 

^ = ^y\ + y\ / \ ^f \ = Xl /i %i I- therefore Xi/| x 2 1 has the same dis¬ 

tribution as y x /\ 2 / b |, so that 

oo oo 

j J e u Hd ggn (r^j dFixf) dF(x 2 ) = J J e'‘ l °*d sgn dH(yi) dH{yf) 
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= sgn Xi , so that the last equation 

yields 

f e“ 1081x1 sgn x dF(x) ■ f e~ lt 108 M dF(x ) 

ai— 00 J—00 

= f e ,( 108111 sgn x dH(x) ■ f e~ li 108 H dH(x). 
J— 00 J— 00 

We know already that [ x | and | y | have the same distribution, so that 


(4) 

r e ,tlceM dF(x) = 1 

r e ,n ° eW dtf(z), 


J— ofl J- 

-M 

thus 



r 00 r 

(5) 

/ lo8 |a:| S g n 2 dF(x) = 

J— CO * 

e ,( 108 1 sgn x dH(x), 

— CO 


except possibly for zeros of / e illae lzl dF(x). By hypothesis the exceptional 

J—oo . 

points are nowhere dense, so that, by continuity, (5) holds for all t. (4) and 
(5) together imply, as in the proof of Theorem lb, that F(x) = H(x), i.e., x and 
| c | x' have the same distribution. 

The next three results are generalizations of Theorems la, b, c, to analogous 
multivariate situations. The first of these is a direct generalization of 
Theorem la. 

Theorem IIa: Let Xi , ■ ■ , Xk have joint distribution F(x\, ■ ■ ■ , Xk) such 

that the complement of the set S of zeros of j e ,2ir * r dF(x i, • , x k ) is (-connected , 

where e is the g l.b. of \ t \ for (<) in S, and let yi , , y k have joint distribution 

G{Vi, ■ ■ , Vk)- Let (xi , • ■ , x^) and (y“ , • • , y£ ), a = 1, • , n, be samples 
from these distributions, with n > 3 Then w t p = x\ — a;" , i = 1, , k, 

j3 = 1, • ■ , n — 1, have the same joint distribution as the corresponding set v,p = 
ifi — y< l f wd only if there exist constants a-i , • • , at such that y\ a-i, • ■ • , 
Vk + ai have the same joint distribution as x i, - , Xk ■ 

Proof: Set 


<p(h , • • 

■ i 4) = 

je'V*dF{x !, .. 

■ , z*), 

f( 4 , ■■ 

• > 4 ) = 

/ e‘|'^ dGW- 

• »1 /fc). 


Jf w lS , i = 1, , k, j3 = 1, 2, have the same joint distribution as , then, 

as in the proof of Theorem la, we have 

<P(hl i • , tkl)<f>{hi , • , — In — hi j ••• , — tu —• III) 

= lA(4i i • • • ) 4i)iK4j, ■ ■ • , hi — ta, • - • , — tu — 4s)- 


if y has distribution H(y). Sgn 


(ftl) 


(8) 
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Again, as before, j <p | — | ^ |; <p(U , • , 4) and ^(4, . ,4) are continuous; 

<p(Q, 0, , 0) = *(0, 0, • • ■ , 0) = 1 Theie will exist a neighborhood N of 

(0, 0, , 0) such that for (4 , , 4) eN the function /(4 , , 4) = 

v ik, • j s defined and continuous. Then there will exist a neighborhood 

<K4 i • ■ • i 4; 

N' C N such that in N' there exists a uniquely determined branch of 
arg /(4 ) ■ • • , 4), continuous in N', and such that if ( 4 , , 4 ) « N 1 and 

(wi, • ■ , Uk) t N' then arg /(4 + u x , ■ , 4 u k ) is also uniquely determined 

and continuous. For ( t) e N 1 and (u) e N f , arg / satisfies the relation 

arg /(4 , , 4) + arg f(u x , , u L ) = arg f(i x + u x , ■ , 4 + u k ) 

It is easily shown that any continuous function satisfymg the equation above 
must be of the form 2 a T t T , therefore 

(7) • • • 4) = °'V(4, • ■ • , 4); ft) e N' 

Just as in the proof of la the relation (7) may be extended, by use of (6), to 
hold for all t. This implies, finally, that the set {y t +■ a.) have the same 
joint distribution as the set {re,], 

Theorem lib is a generalization of Theorem lb to multivariate distributions 
Theorem IIb: Let xi , • • • , x k have distribution F(x i, • • • , x k ) such that the 

zeros of J e ,x<r iaRUA dF(x x , • • • , x k ) are nowhere dense, and lely x , ■ ■ ■ , y k have 

distribution Crft/i, ■ ■ • , y k ). Let (xf , , x k ) and (y“ , • ■ • , y k ), a = 1, • • ,n, 

be samples, with n > 3. Then the set = x^/x” , i = 1, • . , fc, j3 = 1, , 

n — 1, have the same joint distribution as the corresponding set v,p = tfi/y" if 
and only if there exist constants Ci, ■ ■ • , c k such that the set c,y, have the same 
distribution as the x,. 

Proof : The demonstration is parallel to that of Theorem lb By Theorem 
Ha there exist a x , ■ ■ , a k such that 

log |*rl^ _ ll/rl+o,^ 

Set z r = e a, y r , then 

(8) J e ,xtrlotM dF(xi, ... ,x k ) = j e <s,rl0l|lrl dH(z x , ••• ,z k ), 

where (zi, ■ • , z k ) have distribution function H(z x , ■ , z k ). 

We shall continue the proof from here under the assumption that k = 2. 
It will be evident how the proof goes for any k. We have, since «?/«? have the 
same joint distribution as 3? r /x\, 

OO 

/// e «« r( .(.™i-fi-io.U*l) sgn Q sgn 0i) dF(x\,x\) dF{x \, xl) dF(xi, x\) 

—OO 

(9) 



262 


GE0EGE W. BEOWN 


Both members of (9) are evaluated as products, just as was done in previous 
proofs, and from the result, combined with ($), we conclude, as in Theorem lb, 
that 


// e’^'^'.sgn x l dF(x 1 ,x 2 ) = s, f J e lXl ' ,0 * M ngn x 1 dH(x 1 ,x i ), 


where s x = ±1, for all (4, k) Similarly 


oo 

sgn ^ dF ( Xii ^ = Si jf e iXt ' losM sgn xi dH(xi, x 2 ) 

—60 —co 

and 

0 V 00 

iso log |i r ! gg na;i ggn x 2 dF(xi,x 2 ) = S 3 J J e lSirlosU ' rl sgn xi sgn x 2 dHixi , x 2 ) / 
— 00 —00 




with s 2 = ±1, s 3 = ±1. 

00 

Set < 2 ) = J J e st ' l0B,Xrl sgnxi dF(x!, x 2 ) 

—00 

tc 

will , h) - J J e Slr 108 |lrl Sgn Xi dF(x x , x 2 ) 

—00 

00 

k) = J j e l2 ‘ rloglirl agn Xl sgn Xi dF{xi, x 2 ) 

—00 

and let ^i(h , t 2 ), faih , < 2 ), and ^ 12(4 , t 2 ) denote the corresponding transforms 
of H(x x , x^. We have 

k) = Sl\pi(ti, to) 

(10) < ti) = Si 1^2 (fa fa 

<pu(h , fa = Ss^nCh, 4) 
with Si = dbl, s 2 — ± 1 , and s 3 = ± 1 . 

Now, as in (9), by considering sgn ^j^ sgn J we 

obtain the relation 

<Pi (hi ) fa)(P 2 (fa , tii)<pvi (— in — hi , — In — £ 22 ) 


— l/fafa, — tn ~~ tn , — fa — fa)j 

showing that «i, s 2 , s 3 , may be chosen so that SiS 2 s 3 = 1 , that is, Sis 2 = s 3 . 
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Consider now the variates z[ = s r z r ,r = 1,2. Let K(z [, z' 2 ) be the distribu¬ 
tion function of z[ , z[ If we lot 0i(4, 4), 0 2 (4 , 4), and 0 i2 (4 , k) be the trans¬ 
forms of K which correspond to <?,(4 , 4), v> 2 (4,4), and ^, 2 (4,4) lespeetively, 
it is evident that 


( 11 ) 


<pi(4,4) = 0i(4,4) 

, 4) — $ 2 ( 4 ,4) 
*^ 12(4 j 4) = $ 12(4 j 4) 


Moreover, from (8), 

00 « 

// e , 2 ,rloellr, dF(x 1 ,x a ) = [f e'* lrlM dK(x u x,). 

—00 —00 

The last relation, together with the equations (11) imply that F{x) and K(x) 
coincide in each quadrant, thus F(x x , x 2 ) = K(x 1 , xf) for all x x , x 2 . 

The final result is that z[ , z 2 have the same distribution as xi, Xi , i.e , s x e ai yi 
and have the same ]oint distribution as x x and x 2 . 

The next result bears the same relation to Theorem lib that Theorem Ic 
bears to Theorem lb, that is, only positive scale changes are to be permitted. 

Theorem lie: Let x x , • , a* have distribution F(x x , , Xk ) such that the 

zeros of J e iSl ' lo ^ lx ^ dF(x x , ■ , xf) are nowhere dense, and let y X) ■ ■, y k have 

distribution G(y x , , yf) Let (cc“ , • ■ , Xk) and (y x , ■ , ?/“), a = 1, 2, 

, n, be samples with n > 3 Express x x , • ■ , x“ and y x ,■■■,?/“ m spheri¬ 
cal coordinates 


x. 


y 1 — R\ cos <p, , 
yl = R , sin <p\ cos <pi , 

?\ sin 8[ ■ sin 0? -1 ; y" = Ri sin <pl ■ ■ sin tp" 


x\ — r, cos 0i , 
x 2 = r , sin 0, 1 cos 6 2 x , 


Then {0?J, 1 = 1, ■ ■ • , /c, /3 = 1, • • • , n - 1, have the same joint distribution 
as (<p?) if and only if there exist constants ft, > 0, 1 = 1, • ,4, such that the 
set k x y x have the same joint distribution as the set x ,. 


Proof: If j 6 ,j have the same distribution as then it follows that 


have the same distribution as | n| i hence by Theorem lib there exist constants 

c, such that \c,y, j have the same distribution as (a;,). Set Zi = | c; | y t ; we 
wish to show that [z l ] have the same distribution as { x ,) By equation (8) 
in Theorem lib it is known that {| z, |j have the same distribution as [| x, |), 
moreover, if we express z,“ in spherical coordinates, the angular coordinates are 
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the same as those of yt , therefore 


a! \ 

I ^ !J 


have the same distribution as 


sin ce these functions are obtainable in terms of the angular coordinates. 

As before, we shall continue the proof from here under the assumption that 
k = 2. The procedure is a generalization of the procedure in the proof of 

Theorem Ic. sgn x\ = sgn and similarly for y, therefore 


( 12 ) 


I sgn x\ dF(x\, xl) dF{x\, xt) 

= / f e< ^' ‘' <l0gil?l_l0lk?l) sgn *1 dE( ?u * 2 ) dH(x\, x\), 


i = 1, 2, 

where it is assumed that Zi, z 2 have distribution H(zi , z 2 ). As before, set 


v(h,h) = j e'^'° tM dF{x u x 2 ), 

<f> t (h, h) = / e’ s,rlogUrl sgn x t dF{xi, x 2 ), i = 1, 2, 

Mk, h) = / e‘ I<rlogl * ,i sgn Xi sgn x 2 dF(x l , %), 


and denote the corresponding transforms of H(xi, xf) by 0(<i, U), 0i(fi, k), 
6 t {ti, k), and d l2 (ti , h). It has been remarked already that {| z< |1 have the 
same distribution as (| a:,-1 j, therefore 0(h , if) = <p(k , k). Equation (12) yields 
the relation <pft \, Dvi—h , —k) = 0>(h, h)0(—ti 1 —t 2 ),i = 1,2; the zeros of 
<p(ti, k) are nowhere dense, so that it can be concluded that , t 2 ) = , U), 

i = 1,2. Now, from an equation similar to (12) we obtain <pu(U., t 2 ) = di 2 {h , t 2 ). 
As in Theorem lib, the four relations above together imply that F{xi, x 2 ) = 
H(xi, x 2 ), in other .words, (| c, | Vi] have the same distribution as (a;<}. 

We are now in a position to combine some of the preceding theorems so as to 
obtain analogous results for scale and location parameters together. 

Theorem IIIa: Lei x have distribution F(x) such that the zeros of / e' lt dF(x ) 
satisfy the condition of Theorem la, and the zeros of 



e itl log |ll ~ I,l+ “ 2101 dF(x 1 ) dF(x 2 ) dF(x 3 ) 


are nowhere dense, and let y have distribution G(y). Let x \, • , x n and 

X _ 

Vi, • • , Vn be samples, with n > 9. Then w a = —-- , a = 1, ■ • • , n — 2, 

Eft—1 Xn 

have the same joint distribution as the corresponding set w' a = —-— if and 

J/n-l - V n 

only if there exist constants a, c, such that c(y — a) and x have the same distribution. 
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Proof Sufficiency of the condition is an immediate consequence of the fact 
that w' a is invariant under transformations of the form y' = c(y - a). Assume 
then that \w a ] and have the same joint distribution. By elementary 

transformations it is evident that the functions Xl -C X} , X4 ~~ Xe ~ — Xl - B ~ Xe 

have the same joint distribution as the corresponding functions of the y’s, if 
n > 9. Since x,, ■ ■ ■ , x n form a sample it follows that the pairs {x 1 - x 3 , 
%'l — £s)> {z* — Xc,, xs — x«), {x 7 — x%, x& ~ x B l, have the same joint distribu¬ 
tions and are pairwise independent, and similarly for the corresponding func¬ 
tions of the j/’s. Theorem lib assures the existence of constants Ci, c 2 , such 
that Ci(yi — 2 / 3 ), Ct{yi — 1 / 3 ) have the same joint distribution as (xi — x 3 ), 
(xt — xa). Considering separately the marginal distributions it is seen that 
ci( 2/1 — 2 / 3 ) has the same distribution as Ci(y% — 2 / 3 ). 2/1 — 2/3 and 2/s — 2/3 have 
the same distribution, therefore either ct = , or c 2 = — c L Set u a = x a — x 3 , 
v a = Ci( 2 / a — 2 / 3 ), a. = 1, 2. We have, for the distributions of {u, , uf) and 
(ri, v 2 ), relations corresponding to (10) in Theorem Tib, with the additional 
condition that Si = s 2 , because of the symmetry in the variables. This implies 
that either («i, u 2 ) or (—in , —in) have the same joint distribution as (ui, u-f), 
that is, there exists c such that c(yi — yf) and c(t/ 2 — y 3 ) have the same joint 
distribution as — x 8 and x 2 — x s . Application of Theorem la now completes 
the proof. 

Just as before, there is an analogous situation when we consider angular 
coordinates instead of quotients. The proof is immediate; the angular coordi¬ 
nates determine the angular coordinates of {an — x 3 , x 2 — x 3 ), jx 4 — x 8 , x 6 — x«), 
and (x T — xs,, xe — x fl }, arranged as a sample. Then the constants c,, ct in 
the proof of Theorem Ilia are both positive; it follows that ci = c 2 . Applica¬ 
tion of Theorem la gives 

Theorem IIIb : Let Xi, ■ ■ • ,x n and 2 / 1 , ■ • • , 2/n satisfy the hypotheses of 
Theorem Ilia. Set 

Xi x n — t cos 61 , Vi - y n = r' cos e[, 

x 2 — x n — t sin 61 cos 02 , 2/a — 2/n = r ' sin 0i cos 0 2 , 


x„_t — = r sin 0i ■ ■ ■ sin 0„_ 2 ; y rt —1 — y n = r' sin d[ • ■ sin 0,_ 2 . 

Then 0i, ■ 0 n _ 2 have the same joint distribution as d [, • ■ • , d' n -i if and only if 
there exist constants a and c > 0 such that c(y — a) has the same distribution as x. 

Theorem IVa is a generalization of Theorem la to cover arbitrary linear com¬ 
binations of some subset of the sample. 

Theorem IVa: Suppose x has distribution F(x) such that j e' lx dF{x ) does not 

vanish, and let y have distribution G(y). Consider the functions w a = 

n-—m n —tt i 

*^a ^ lafl&m+f} j Wa = 2/a ““ j CL i = 1 ? 2j ■ • ■ , 171 f /J 2j ■ j 

/S-l l 
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n — m, and suppose that m > n — m. Then, if jw„} have the same joint distri- 

n —m 

button as [w' a \ and if X ^ 1 f or somp “i it follows that F(y) = G(j/)■ jY 

Zjla? ~ 1 for all a there exists a constant a such that F(y — a) = G(^), 

0 

Proof- Denote the characteristic functions of x and y by p(f) and f(t) respec¬ 
tively. By expressing the fact that {w a } and {«£), a = 1, 2, • ,n~m- f l 
have the same characteristic function we obtain the functional equation 

ff -m+ 1 n—m / n—m+1 \ n—m-K w— m / n—m+l \ 

n fW n ? -I Lpt a ) = n ^») n^(-n Lst a ). 

a-1 0-1 \ a=l / 9-1 0-1 \ a-l / 

By hypothesis <p(t) does not vanish, therefore 4>(l) has no zeros, because of the 
relation above. <p(i) and i p(t) are continuous, thus the function fit) = 
log tp{L) — log fit) can be uniquely defined in a continuous manner for all t 
The equation above becomes 

n —m+ 1 n—m / n—wi+1 \ 

(13) X /«.) + X/l-X 4U«o. 

9-1 0-1 \ 9-1 / 

The constants Lp are necessarily linearly dependent, so that, for some a, i! ojJ 
can be expressed as a linear combination of the others ; suppose then that 




7i—im+1,0 


n—m 

= £e„Z 

a=l 


Putting these values in (13) we Have 

ft—m+1 

(14) 


-m+1 ft—m / n— m \ 

X /(O + X/( ~X ^90(fa + fn-m+lCa) ) = 0. 

9-1 0-1 \ a-l / 

It can be assumed that Sc 2 „ ^ 0, for, if e a = 0 for all a, we have l n - m+ i,g = 0, 
P = 1, • ,n — m, that is, = 2 /„_ m+ i and w n - m +i = Sn-m+i, hence x 

and y have the same distribution. Assuming ei ^ 0, set t a = — eX-m+i, 
« 2, ■ ■ • , n — m, in (14), obtaining 

n—m n—m 

(16) /(<j) + X /(“5a<n-w+l) 4-/(<n-m+l) + XA-W«I + fn—m+l)) = 0, 




0-* 


now, recalling that /(0) = 0, set f„_ m+ i = 0, getting f(tf) + X /(-Mi), 

0=1 

Evaluating this with argument fi + ei<„_ m +i, and substituting back in (15) it 
appears that 

n—m 

( 16 ) /+) +/(<«-«+l) + X /( — e«fn-m+l) — f(h + Cif„_ m+ ]). 

a-2 

Now setting q = 0 in (16) we have the relation 

n—m 

/(^n—m+l) “1“ /( “~Vat n — m+l) = /*(#1 m+l) * 
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thus we have finally f(tf) + /(eit n _ m+1 ) = ffa + eA^+i), or, since ej ^ 0, 
f(h “t U) = /(h) + /(h). The last relation implies that/(2) = cl, sinee/(i) is con- 

f n—m+1 n'-m+l n—m 'j 

tinuous. Now replace /(f) by c£in (13), getting c< £ |„- £ = 


a-l 0-1 


0, that is, either c — 0, or ^ l a p = 1 for all a. We conclude then that <p(t) = 


3=1 


^(f), unless 22 lap = 1 for all a. If 22 = 1 for all a we have <p(t) = e cl ip(t). 

_5 _ fl 


<?(—f) = ip(t) and \p(—t) = \p(t), hence c is of the form c = la, where a is real, 
in other words <p(t) - e m \p(t), thus concluding the proof of the theorem. 

It was assumed in Theorem IVa that <p[t) has no zeros. If <p{i) has zeros 
we have proved that, for an interval j 1 1 < e, <p(t) = ^(f) (or <p{t) = 

This does not necessarily imply the result of Theorem IVa, but it does imply 
at least that if the /cth moments of x and of y (or of y — a) both exist they 
are equal 

The last result m this series can be proved by methods similar to those used 
in Theorem IVa. 

Theorem IVb/ Let x and y satisfy the hypotheses of Theorem IVa. Suppose, 
moreover, that m > 2 (n — m), that the rank of || l a9 || is n—m, and that 

n—m 

22 lop ^ 1 f or least 2m — n values of a. Then, if there exist constants {c„} 
0-1 

such that the set (c a u> a } have the same joint distribution as jiu a }, it follows that, 
for some a, c a y has the same distribution as x 


3. Application to Composite Hypotheses. The results of section 2 have a 
significant application in the theory of testing composite hypotheses. Suppose 
that x has a distribution of the form Fix, 0 2 , 0 2 ), and that the hypothesis 
0 2 = 02 is to be tested, without reference to the value of 0, We assume that 
the parameters are independent, i.e., Fix, 0i, 0 2 ) = Fix, d[, 0 2 ) implies that 
0 X = 0( and 0 2 = 02 • It is true in a wide class of important cases that, given 
a sample X \, ■ , x n from the distribution F{x, 0i, 0 2 ), there exist functions 
y a (xi , ■ • • , x n ), a = 1, 2, • • ■ , p, such that {y a } have joint distribution inde¬ 
pendent of 0i, but depending on 0 2 . Now if the \y a \ are such that their joint 
distribution redetermines the original distribution, except for 0 X , one can reason¬ 
ably use the p-dimensional distribution of the { y a ) for testing the hypothesis 
02 = , thus reducing the composite hypothesis to a simple hypothesis. In 

testing this simple hypothesis, every alternative hypothesis (corresponding to a 
value of 0 2 ) determines a distribution of % among the alternatives Fix, 0 X , 0 2 ) 
except for the unknown 0i, that is, there is a one-to-one correspondence between 
the two sets of alternative hypotheses, expressed by the fact that if 0 2 = 0 2 
then the distributions of the set \y a ) corresponding to 0 2 = 0 2 and 0 2 = 0j 
must be different. 

Suppose, for example, that it is desired to test whether y = x — a for some a 
has the distribution F(y, 0°), with the assumption that, for some a, y has the 
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distribution F(y, 6). Given a sample one can form the set w a = x a — x n , 
a = 1,2, • • , n — 1, obtaining the distribution (?(% • , w>„_i, 6) ; now con¬ 

sider the simple hypothesis d = 0°, knowing that G determines 0, by Theorem la. 
Similarly one can test whether cx, for some c ^ 0, has distribution F{y, 0°), 
by forming w a = x a /x n , a = 1, • , n — 1, or by expressing (xi , . ; x n ) 
in spherical coordinates and considering the angular coordinates, according to 
whether both positive and negative or only positive values of c are to be allowed. 

In the same way one can test the hypothesis 0=0° under the assumption 

X —- x 

that c(x — a) has distribution F(y, 8) by forming w a = — -- a = 1 . . 

x„_i — x n 

n — 2, or by expressing (xi - x n , ■ , x n -i - x n ) in spherical coordinates and 
considering the angular coordinates. 

Theorem IVa may be applied to analogous problems, in which the hypothesis 
9 = 0° is to be tested under the assumption that y ~ u — 2a t x t has distribution 
F(y , 9) for fixed values of the Xi, with the a, unknown. In such problems 
there exist linear combinations of the observed values of y which are independent 
of the fl,. By Theorem IVa, under certain conditions the joint distribution of 
these linear combinations determines the original distribution of y, without 
regard to the a. 

In applying some of the preceding results we must verify in certain cases that 
the zeros of J e' lx dF(x ) are nowhere dense, for a certain distribution function. 


By a change of variable the condition of Theorem lb can be stated in this form, 
moreover if F (x) satisfies this condition it is evident that it satisfies the condi¬ 
tion of Theorem la. A sufficient condition applicable to a considerable class 
of cases has been obtained by Levinson [4]; if f{x) is 0(e~ Hx) ) as x —> where 


8{x) is monotone and ^ dx diverges to «, then J~ e' lT f(x) dx cannot vanish 


on an interval without vanishing identically. It is evident that it is likewise 
sufficient if the corresponding condition holds as x —» — « instead of T °°. In 
particular, if there exists A such that/(a;) = 0 for x > A (or for x < A) it is a 

consequence of the Levinson result that J e ltx f(x ) dx has no intervals of zeros. 

It can be established easily that if f(x ) is majorized by | x e > 0, in the 

neighborhood of the origin, then f e ' n ° sM f(x) dx has no intervals of zeros. 


As a simple example consider the rectangular distribution on (0, 1). Let 
(x — a) Jr have this distribution with a unknown, r > 0, and suppose that we 
are interested only in r. Given a sample Zj, ■ ,x n form the functions y a = 
(x a - x n )/r, a = 1, 1. Set y M = max ( y a , 0), y L = min {y a , 0). 

Then it can be shown that yi, • ■ , y n ~i have probability density (1 — Vm + Vi) 
in the region -1 < y a < 1, y M - y L < 1, zero elsewhere, i = yu ~ Vl is 
of course the quotient of the sample range by r. It can be shown that has 
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density n(n - 1)(1 - 4/)4> T "~' city. Theoiem la makes it possible to base any 
tests not involving a on the distribution of the y a , since if the y a have the 
stated distribution then (a: — a)/r for some a must have the rectangular dis¬ 
tribution. 

Similarly, suppose y = (x — a)/r has the distribution e~ v , y > 0, for some 
a, r. Then w„ = ——-—”, oi = 1, 2, ■ • n — 1, have distribution density 


- e S ““ + " U ’ L , where w !t = min (0, iu a ). Again, the latter distribution may be 
n 

used to estimate r. 

Let us examine the distributions of functions of the type considered, in the 
case of normality. Assume that x x , ■ , x n are a sample of n observations 

from a normal distribution with unit variance and unknown mean. The 
variables y a = x a — X\ , a = 2, • , n, have a joint normal distribution with 

zero means and matrix of variances and covariances || A' 1 1| = || 1 + {», [|. 
Then Theorem la shows that if \y„\ have this joint distribution then x is nor¬ 
mally distributed with unit variance. Note that x»~i = 2A t] y t y, = S(i„ — x) 2 , 
If we had x = x’/cr, then 2(za — x 1 ) 2 = <rV n _i , giving the estimate 


2(x' a - x'f for a 2 . 


There are, of course, many ways in'which the matrix || A,, || may be trans¬ 
formed into a diagonal matrix in order to obtain a new set of independently 
distributed variates; one convenient set is the set VI 2 / 2 , VI (v> ~ a2/s), ■ ■ , 

A / (y n -—, y a ) In terms of the original x’s we have VI ( x i ~ *0 

y n \ n - 1 ^ 

VI (*3 — + r 2 )), A/ n - * (x n --- Em; these functions of the 

Y n \ n —l.-i / 

data are independently distributed according to the normal distribution with 
zero mean and unit variance, 

Similarly, in the case of a sample Xi , • • • , x„ from a normal distribution with 
zero mean and unknown variance, there exists a set of n — 1 functions with 
distributions independent of the variance. A convenient set of functions is 
the set 


, _ VmZm+i. 
lm ~ -’ 


m = 1, ■ • ■ ,n — 1. 


It is known (see Bartlett [1]) that the variables t m are independently distributed 
according to student f-distributions with m degrees of freedom respectively. 
The set t m determines the set of angular coordinates obtained by expressing 
Xi » ■ ■ ■ , in spherical coordinates, hence we can conclude, conversely, that if 
ji m ) have this joint distribution then x is normal with mean zero. 



270 


geoikm W. BROWN 


Finally we can eliminate both mean and variance Suppose * 1 , ■ 
sample from some normal distribution. The variables 


Hm “ 



m = 1,2, 


Xn are a 


i»- 1, 


are normal and independent with mean zero and some variance. Then we have 
the set 




1 # \ 
X ^~r + lh X 1 

4 




independently distributed according to (-distributions with r degrees of freedom 
respectively. It may be convenient for computational purposes to make use of 
the identity 



We then have 



r+l 

z 

3-1 


(Xj X( r .(.ij) . 


<; = 


*(»+«) 

/r+l ' 1 

]/ Z (X, - 5(r+l)) 2 

V l«e>l 


r = 1, • • * , n — 2. 


Now, by Theorem IIIc, we know that if the set { t ' r } has this specified distribution 
then x must be distributed accordmg to some normal distribution. The set 
[tr] may be used to test the goodness of fit of the observations to normality, 
by first adjusting the set [t' T ] to a standard basis of comparison, i e., by con¬ 
sidering F,(t r ), where F r is the corresponding cumulative distribution function 
and then applying, for example, a % goodness of fit test to these n — 2 quanti¬ 
ties, with respect to the rectangular distribution on (0, 1). 


REFERENCES 

II] M S. Bartlett, Proc Camb. Phil Soc Vol. 30 (1934), pp. 164-169 

[2] J, Neyman and E S, Pearson, Biometrika, Vol. XXA, (1928) pp. 175-240, 264-294. 

[3] R A. Fisher, Proc Roy Soc. A, Vol. 144 (1934), pp 285-307. 

[4] N. Levinson, Proc. London Math Soc. Vol, 41 (1936), pp, 393-407 

[5] E. J, G. Pitman, Biometrika Vol 30 (1939), pp. 39M21. 

Princeton University, 

Princeton, N. J. 



THE SELECTION OF VARIATES FOR USE IN PREDICTION WITH 
SOME COMMENTS ON THE GENERAL PROBLEM OF 
NUISANCE PARAMETERS 

By Harold Hotelling 

1. Maximum Correlation as a Test For predicting or estimating k particular 
variate y there is frequently available an embarrassingly large number of other 
variates having some correlation with y. For example, in fitting demand 
functions by means of economic time series, the number of series of observations 
having some relation to the demand which is sought to be estimated is apt to be 
very large, whereas the number of good independent observations on each is 
quite small. The proper coefficients in the regression equation must ordinarily 
be determined from the observations, and must not exceed in number the ob¬ 
servations on each variate Furthermore, in order to have a measure of error 
that will make it possible to distinguish real effects from those due to chance, 
it is necessary that the number of predictors 1 shall be enough less than the 
number of observations on each variate so that the residual chance variance 
can be determined with an appropriate degree of accuracy. It is desirable to 
select a set of predictors yielding estimates of maximum but determinable ac¬ 
curacy, and at the same time to avoid the fallacies of selection among numerous 
results of that one which appears most significant and treating it as if it were 
the only one examined. 

Considerations other than maximum and determinate accuracy are of prac¬ 
tical importance. The labor of calculation by the method of least squares 
becomes a serious obstacle to the use of the theoretically optimum set of vari¬ 
ates when these are very numerous, though the rapid current development of 
mechanical and electrical devices suitable for these computations offers a hope 
that the limits now set in practice in this way will soon be considerably increased. 
Furthermore, predictions or estimates must, as in speculative business 01 in 
military activity, be made from moment to moment, often in a rough manner 
by persons incapable of or averse to using complex formulae, and in such activi¬ 
ties frequent revisions of the regression equations must be made to accord with 
altered conditions. Also, in temporal predictions, the time of availability of 

1 1 use this term for what are often called the independent variates in a regression 
equation, since these ordinarily are not really independent in the probability sense. Simi¬ 
larly I shall call the "dependent” variate the ■predicland By prediction I mean merely the 
use of regression equations to estimate some unknown variate by means of the values of 
related variates, without any necessary connotation of temporal order, though the most 
interesting applications seem for the most pait to be those in which we pass from a knowl¬ 
edge of the past to an estimate of the futuie. 
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the values of the predictois is important, since an early prediction (e g. of the 
size of a harvest) is more valuable than a later one of the same accuracy, 

If we make the usual assumption 2 that the probability distribution of y is, 
for every set of values of the predictors, normal with a fixed variance <r 2 and an 
expectation that is a linear function of the predictors, we shall wish to minimize 
(j 2 subject to appropriate limitations, and this amounts to the same thing as 
maximizing the multiple correlation p of y with the predictors, since 1 — p 2 is 
the ratio of <r 2 to the total variance of y, which is the same for all sets of predictors. 
The estimates s and R of cr and p obtained from the available sample are of 
course a different matter. But it is clear that the value of R provides a suitable 
criterion of choice under the following conditions Wc are called upon to choose 
one among two or more sets, each consisting of a fixed number of predictors; 
for each predictor we have a known value corresponding to each of the values 
Vi, • - ,Vs observed for the predictand; and there is no basis for preferring one 
of these sets to another either in theory, in observations extraneous to those just 
specified, or in cost or time of availability In particular, if just one predictor is 
to be used, that having the highest sample correlation with the predictand should 
under these conditions be the one adopted. But in making such a choice a test 
of its accuracy is required, to take account of the possibility that the wrong 
choice has been made because of chance fluctuations in the sample correlation 
coefficients 

There are innumerable economic variates available for prediction of 
business conditions, and most of these are highly correlated with each other. 
The selection of one business index instead of another for a particu¬ 
lar purpose will involve the question which has exhibited the higher correlation 
with the quantity to be predicted, and consequently the question of the definite¬ 
ness with which the difference between the calculated correlations can be 
regarded as significant. 

Our problem evidently has a bearing on governmental policy m selecting 
among the numerous series of data those whose continuation will be most valu¬ 
able. The high cost of assembling these statistics dictates a careful selection of 
a limited number of series having little correlation with each others' current 
values, but with correlations as great as possible with those things whose predic¬ 
tion or estimation is most important. 

2. The Choice of one Predictor with Two Available. Let us take first the 
simplest case, which may be illustrated by a Michigan State College problem of 

1 We shall not here go into the question of the applicability of these standard assump¬ 
tions to time series otherwise than to note that some transformations of observations 
ordered in time are usually necessary and sufficient to obtain quantities satisfying the 
assumptions so closely that deviations from them cannot be detected. Such transforma¬ 
tions include replacing a variate by its logarithm, and eliminating trend and seasonal 
variations by least squares. In view of the satisfactory adjusted observations found 
empirically by these and similar methods, the usual objections to studying time scries by 
exact methods seem much exaggerated 
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which Dr. W D. Baten has told me. The ultimate weight of a mature ox is 
estimated by means of his length at an early age. The question has been raised, 
however, whether a more accurate prediction, might not be made by means of 
the calf’s girth at his heart. Records were at hand of 13 oxen showing their 
lengths and girths as calves and also their weights when mature. A regression 
equation involving both length and girth would presumably give greater accuracy 
than' either variate alone; but it appears that those who make the estimates 
desire a simple formula involving only one variate. Suppose, then, that in such 
a sample the correlation of weight with length is n = .7, that the correlation 
of weight with girth is r 2 = .5, and that the correlation of girth with length is 
r 0 = 4. Is the difference r x — r 2 = .2 sufficiently great in relation to its sampling 
errors to warrant the inference that girth is really a better predictor than 
length, or must the question be left in abeyance until more observations can be 
accumulated? 

A straightforward procedure which would have been used with little question 
before the advent of modern exact methods is to calculate the asymptotic ap¬ 
proximation to the standard error of r% — r 2 by the differential method, assuming 
the three variates to have the trivariate normal distribution, and to regard the 
difference of the correlations as significant if it exceeds a multiple of this standard 
error determined by the tables of the normal distribution. The calculation of 
the asymptotic approximation <T n ~r t may be carried out in the following manner. 
Let pi, pi, and po be the population values of n, n, and r 0 respectively. Then 
if <r„ denote the population covariance of x, and x,(i, j — 0, 1,< 2), wc have 

<701 

Pi = „ /-» 

V ffooffn 

with similar formulae for p 2 and po. Likewise the sample estimates of these 
parameters are given by such expressions as 


__ Sol 
VsooSu 


Taking the logarithm of this last expression, expanding about the population 
values, denoting by the operator 5 the deviation of sample from population values 
of the covariances, and the resultant deviation in n , and dropping terms of 
order higher than the first, we have: 


In the same way 


. ( fisoi 

£r, = pi I — 
\ffoi 


_5s m _ 

2aoo 2<r n / 

* _ ( S®02 ^Soo Ssa \ 

r * ^\<T02 2<roo 2 O' 22 / 


The asymptotic value of the sampling covariance is obtained by multiplying 
these two expressions together and taking the expectation. The sampling co- 
variance of two estimates of covariance of the usual kind (sum of products 
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divided by number of degrees of freedom) in the same sample, having n degrees 
of freedom (which ordinarily means that, there are n + 1 individuals in the 
sample and that the means are eliminated), is given exactly by the formula 2 

jEJ^dSijSSkm) — ik& im + G /ft-. 


in which the subscripts may have any values, equal or unequal. When this 
formula is applied to each of the nine terms of the product and the results are 
expressed in terms of the correlations p,, there results the asymptotic expression 
for the covariance given by 

nE(Si\5r/) = ipip 2 (pi + pi + Pa ~ 1) + Po(l — pi — pi)- 

This method provides also one of the derivations of the familiar formula which 
may be written 

na\ = nE(Snf = (1 - pi) 2 , ml, = (1 - p 2 2 ) 2 . 

The variance of the difference of n and r 3 is the sum of their variances minus 
twice their covariance. Hence 


ncr?,_ r 2 = (1 — Pi) 2 + (1 — plf — PlPi{pl + P2 + Po — 1) + 2po(pf -f- p2 - 1). 


We are testing the hypothesis that pi = p 2 . If we put a common value p 
for them in the last expression and simplify, we obtain for the standard error 
of the difference, __ 


a Ti-n 



(1 - p fl )(2 - 3p 2 + pop 2 ) 
n 


The second factor in parentheses is always positive because of the inequalities 
limiting the correlations among three variates. 

This formula contains two unknown parameters, p and p 0 . The classical 
procedure would be substitute n , r 2 and r 0 respectively for pi, p 2 , and po in the 
previous formula, and use the resulting standard error expression as if the ratio 
to it of r x — r 2 were normally distributed A first modification, more m line with 
modern ideas, would be to use some kind of average of r x and r 2 as an estimate 
of both pi and p 2 , since the null hypothesis tested is that these are equal. But 
whatever sample estimates we substitute for p and p 0 , the formula remains un¬ 
satisfactory, since no suitable limits of error are available. If instead of the 
standard error wc were to work out the exact distribution of n — r 2 we should 
still not be free from the difficulty. This exact distribution clearly involves 
both p and p 0 , since its variance does so. Neither can we escape from the 
trouble by using some function s = f(r), such as the inverse hyperbolic tangent 
suggested by R. A. Fisher, and considering the standard error of ?i — ? 2 = 


3 1 have given a derivation of this formula from the characteristic function of the multi¬ 
variate normal distiibution [1] Numerous special cases appear in earlier literature, The 
derivation above is a simplification and impiovement of several versions, appearing in 
the various early writings of Karl Pearson. 
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f(r i) - f(rz), for this standard error will have as the first term in its expansion 
in a series of powers of n 1 simply the product of the expression above for 
tr ri _ ra by/'(p), and this must clearly involve both p 0 and p. 

3. Nuisance Parameters. This is not by any means the only statistical prob¬ 
lem m which unknown and undesired parameters enter into the distribution of 
the statistic which we should naturally use to test a hypothesis. Indeed, the 
early investigation which was perhaps most influential in setting the whole tone 
of modern statistical research was that [2] in which W. C Gosset (“Student”) 
arrived at the exact distribution of the ratio of a deviation in the mean to the 
estimated standard error. The previous practice (which unfortunately survives 
today in some quarters, and is even taught to students without explaining its 
approximate character) was to neglect the sampling errors in the estimate of 
the unknown variance c- 2 and to treat the ratio as normally distributed with 
unit variance. The rigorous derivation by Fisher [3] of the Student distribution 
makes clear the manner in which the nuisance parameter d may in this, and m 
some other, problems be eradicated from the distribution through integration, 
after altering the original statistic (the deviation in the mean) by dividing it 
by another statistic. The new statistic, the Student ratio, vanishes whenever 
the old statistic, the deviation in the mean, does so, and the same hypothesis 
is tested by both. This then is one way to get rid of a nuisance parameter: 
when you have a statistic estimating a parameter whose vanishing is in question, 
but whose distribution involves another parameter, alter the statistic by multi¬ 
plying or dividing by another statistic in such a way that the new function 
vanishes whenever the old one does so; and do this m such a way that the new 
distribution will be independent of the nuisance parameter. Unhappily, this 
method has been applied successfully only in particular cases, and no way to 
use it in the problem at hand has been found. 

A second method is that of transformation employed by Fisher in dealing with 
such problems as testing the significance of the difference between the correla¬ 
tion coefficients in independent samples between the same two variates. The 
need for the transformation in this case is occasioned by the presence hi the 
distribution of the difference of the sample correlations of the unknown true 
value, which is not directly relevant to the comparison. Wc have seen that 
this method also fails to solve our problem. 

A third method of dealing with nuisance parameters is the use of fiducial 
probability by R. A, Fisher [4] and by Daisy M Starkey [5] in testing the 
significance of the difference between the means of two samples when the 
variances may be unequal. Criticisms of these applications of fiducial probability 
have been made by M S. Bartlett [6] and B. L Welch [7], and the field of 
applicability of such methods is still in need of elucidation. 

Some findings of J. Neyman [8] having a bearing on the general nuisance 
parameter problem should also be noted. 

The only other class of methods for dealing with nuisance parameters of which 
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I am aware involves the comparison of the particular sample obtained, not with 
the whole population of samples with which a comparison might be made if we 
knew the value of the troublesome parameter, but with a sub-population selected 
with reference to the sample in such a way that the distribution, m this sub- 
population, of the statistic used does not involve any unknown parameter. An 
example is the testing of significance of a regression coefficient Thus if we 
suppose that a sample of values of x and y is drawn from a bivariate normal 
population, and calculate the regression coefficient b of y on x in the sample, 
the distribution of b involves not only the population value /3, but also the ratio 
a of the variances in the population Since this second parameter is unknown, 
and can only be estimated from the sample, it is not possible to use the distribu¬ 
tion of b hi the whole population directly to test the significance of b — 0. 
What we do is to find the place of this difference, not in the whole population 
of values in which both x and y are drawn at random, but in a sub-population 
for which the values of x are the same as in our sample. We may alternatively 
say that we limit the sub-population only to that for which the sum of the 
squares of the deviations of the values of x from their mean is the same as in 
our sample; the results are the same. The distribution in this sub-population 
of the ratio of b — 0 to its estimated standard error is of the Student form, with 
no unknown parameters, and on this basis it is possible to make exact and 
satisfactory tests and to set up fiducial limits for b. Another example is that 
of contingency tables. The practice now accepted (after a controversy) for 
testing independence of two modes of classification, such as classification 
of persons according as they have or have not been vaccinated, and again ac¬ 
cording as they live through an epidemic or die, is to compare the observed 
contingency table, not with all possible contingency tables of the same numbers 
of rows and columns, but only with the possible contingency tables having 
exactly the same marginal totals as the observed table. 

4. An Exact Solution, We shall solve the problem of the significance of the 
difference of ri and r 2 with the understanding that the meaning of significance 
is to be interpreted by reference to the sub-population of possible samples for 
which the predictors an and have the same set of values as those observed in 
the particular sample available. This procedure, besides yielding an exact 
distribution without unknown parameters, has the advantage of relaxing the 
stringency of the requirement of a trivariate normal distribution. We now make 
only the assumptions customary in the method of least squares, that the pre- 
dictand y has the univariate normal distribution for each set of values of x\ and 
Xi , independently for the different sets, with a common variance o- 3 , and with 
the expectation of y for a fixed pair of values of the predictors a linear function 
of these predictors, No assumption is involved regarding the distribution of 
the predictors, since we regard them as fixed in all the samples with which we 
compare our particular sample. The advantages of exactness and of freedom 
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from the somewhat special trivariate normal assumption are attained at the 
expense of sacrificing the precise applicability of the results to other sets of 
values of the predictors. 

Since the correlational properties are unchanged by additive and multiplica¬ 
tive constants, we may suppose that 

(1) Sxx - 0 = Sx 2 , Sxt = 1 = Sx \, 

where S stands for summation over a sample of N individuals. The notation 
may be made more explicit by the adjunction of an additional subscript a, vary¬ 
ing from 1 to N, to denote the individual member of the sample, so'that instead 
of Sxi , for example, we might write Sx la The omission of this additional 
subscript is convenient and will usually leave no ambiguity when we deal with 
sums, but it will be convenient to retain it in connection with individual values 
The correlation r 0 of Xi with x 2 in all those samples we shall consider is, by (1) 

T a = SXiX 2 . 


Now consider the new quantities 

_ X\ a X 2a "II _ X\a “f* X 2a 

[ ) x “~ V2(r=W)’ x ° ~ 

Evidently, from (1) and (2), 

( 3 ) Sx' = 0 = Sx", Sx ' 2 = 1 = Sx"\ Sx'x" = 0. 

Since the mean value E(y a ) is a linear function of x ia and x 2a , y„ may, upon 
subtracting a constant from all these expectations, be written 

(4) y a = PiXi a + + A„, 

where Ai, ■ • , A* are normally and independently distributed with variances 
all equal to a 1 and expectations zero. The assumption that Xi and x 2 are equally 
correlated with y in the population leads to the conclusion that (3i = j3s ; and 
putting + r 0 ), we then have from (4) and (2): 

(5) y a - fix'!, + A a . 

Consequently, by (3) 

Sx'y = Sx a y a = fiSx'x" + Sx' A = Sx' A; 

and this function has a normal distribution with zero mean and variance c. 
If in the sample wc work out a regression equation 

y = a + b'x' + b"x", 

the normal equations for determining 6' and b" must by (3) take the simple forms 
a = y, b' = Sx'y, b" = Sx"y. 
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From the general theory of least squares it is known that the sum of squares 
of residuals is 

Sv* = S(y - Yf = Sy 2 - ySy - (Sx'y? - S(x'’y) 2 , 

and that Sv^/a has the x distribution with n = N — 3 degrees of freedom, 
independently both of Sx’y and of Sx"y. From these facts it follows that 

( 6 ) t = Sx 'y ]/^ 

has the Student distribution with n degrees of freedom. Since in accordance 
•With the foregoing definitions and (1) we have 


Sx'y - (r, - r>)y'|Ljr 
and since also it is known that 


where 


(6) may be written 

(7) 


Sv = S(y - yf 



n 

1 

n 


h) , 
1 




The probability of a greater value of | i | is given by tables of the Student 
distribution with n = N — 3, If this probability is sufficiently small (which 
conventionally means less than 05, or sometimes .01) wc have a corresponding 
degree of confidence that the variate chosen because of a higher correlation in 
the sample has actually a higher correlation than the other in the population. 


5. The Selection of One Variate from Among Three or More. Suppose that 
we are to choose one of the variates xi, ■ ■ , x p in order to predict y. (p < N — 1) 
We choose the one having highest correlation, and wonder how much confidence 
to place in this choice. We shall now determine the distribution of a function 
suitable for testing the hypothesis that there is no real difference between any 
pair of the correlations of Xi , • • ■ , x p with y. Again we shall assume the values 
of these predictors fixed, and look for the place of our particular sample among 
all samples having these values, with only y free to vary normally by chance. 

Let a*, = S(Xi — — x,), and let c»,■ be the cofactor of a„ in the deter¬ 

minant a of these quantities, divided by a. Then 

_ (1 if j = k, 

0^.1 Cxk — Ojk — | 

IP if j t* k. 


£ 


(8) 
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Here 2 stands for summation from 1 to p. Let 


( 9 ) 

( 10 ) 

(ID 

From (9) it follows that 


Wi - 


E C.J 


EEc,/ 


li S(x t x)y, 

l = 2w,Z,. 


( 12 ) 


2 = 1. 


From the hypothesis that y is in the population equally correlated with all the 
X{ it follows that l \, • • , l P have equal expectations, which we may denote by 
A, and from (11) and (12) it follows that also E(l) = A. Obviously 

(13) E(h — A)(Z, — A) = <r 2 a„, 


where er 2 is the variance of those values of y corresponding to a fixed set of 
values of the z’s. From (11), (13) and (9) we obtain 


(14) 


E(l - A) 2 = 


22c„ 


Since the 1, are linear functions of the y’ s, they have the multivariate normal 
distribution. From the theory of this distribution and the values (13) of the 
covariances it follows that the distribution has the form 

(2ir) ~ ip a~ i cr~ p e~ T,2 ' ,t dk ■ ■ dl P , 

where a is the determinant of the ai 3 ’s, and 

T = 22c„a - A)(1, - A). 

We may introduce linear functions l[ , , l' v of Zj — A, • , l P — A such that 

T = + + Ip, and such that Ip = {l - A ) 2 22c„. Now EL_±t l 

has the x distribution with p - 1 degrees of freedom. The numerator of this 
expression equals 

T — Ip = 22c,,(1, - A ){l, - A) - (l - A) 2 22c<, 

* = 22 cull, - l 2 22c,, 

= 22c,', (li I)(L ^). 


The penultimate form shows that this function is independent of A; the last, 
as a positive definite form in the deviations of the V s from their weightedmean, 
shows that sufficiently large values of the expression will reveal with definiteness 
the inequality of the predicting powers of the p variates when this exists. 
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It is well known that the regression coefficients of y upon the set of variates 
xi, ■ , x P are completely independent of the' sum of squares Sv 2 of residuals 
from the regression equation Since the Z’s are linear functions of these regres¬ 
sion coefficients, (namely the linear functions appearing in the normal equa¬ 
tions), they also are independent of Sv 2 . Hence, if we put 

2 22c,,' It l] l 22 Ci] 

si =--—r —> 

v -1 

2 _ Sv 5 

32 JV-p-1’ 

the ratio F = &l/sl will, in case of equality of the correlations of the' various 
x'g with y, have the variance ratio distribution with = p — 1 and = N — 
p — 1 degrees of freedom. When p = 2 this test reduces exactly to (7), as it 
should, and F — t 2 . 

In the numerical application of this method, the regression coefficients b, 
of y on Xi , • • • , x P should first be worked out by the inverse matrix method. 
The right-hand members of the normal equations are k , ■ , l p , the coefficients 

in these equations are the a, 3 , and the calculation of si is simplified with the 
help of the identity 

22 Cijlilj — 2b,Z<. 

6. Selection of Additional Variates When Some Have Been Chosen. Sup¬ 
pose now that q predictors have been included definitely in the regression equa¬ 
tion, and that one more is to be selected for inclusion among p additional pre¬ 
dictors that are available. The criterion now is that that one should be chosen 
tentatively which has the highest partial correlation with the predictand, elimi¬ 
nating those already definitely chosen; but the confidence to be placed in the 
choice is to be judged by an adaptation of the criterion of the preceding section. 
It is only necessary to consider the a,‘ ; , I,, c,, and hi (i, j = 1, • • ■ , p) as cal¬ 
culated from the new predictors and the deviations of y from the regression 
equation on the predictors already adopted. Formulae may easily be derived 
for the values of these quantities in terms of those already found.and the sums 
of products, so as to simplify the calculations. Sv 2 will now stand for the sum 
of squares of residuals from the regression equation involving all the p + q 
predictors. It is to be divided by IV — p — q — lto obtain si. The numbers 
of degrees of freedom with respect to which F is to be judged are now n\ — p — 1 
and n% = N — p — q — 1. When p = 2 this test, like that of the preceding 
section, reduces to the use of the f-distribution of (7), with n = N — q — 3, 
and the correlations standing for partial correlations eliminating the predictors 
already definitely chosen. 

A special instance in which this procedure is applicable is in economic time 
series, in which time, in the form of orthogonal polynomials, must ordinarily be 
"partialled out” in order that tests of significance may be sound. 
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7. Further Problems. It is natural to ask whether the foregoing work can be 
extended to examine the soundness of the selection, on the basis of a greater 
multiple correlation, of a particular set of two or more variates, chosen from 
among several such sets. The simplest such problem that goes beyond what 
has been done above deals with two sets, each of two predictors, having in a 
sample multiple correlations R and R' with the predictand. The question is 
whether the difference R — R' is significant. 

Suppose that, in the interests of simplicity and the hope of attaining a solu¬ 
tion satisfactorily free from unknown parameters, we assume as before that the 
predictors have a fixed set of values, the .same in all samples Since multiple 
correlations are invariant under linear transformations of predictors, we may 
without loss of generality assume that the predictors in each set are mutually 
uncorrelated and have sums of squares equal to unity Indeed, we may go 
somewhat further in standardizing the sets of values to which consideration can 
be confined without loss of generality, with the help of some ideas introduced 
in the paper [ 1 ]. In the terminology of that paper, the variates in each set may 
be considered canonical with respect to the relationship between the sets. This 
means that linear functions %\ and Xi of the two variates in one set, and linear 
functions x[ and x[ of those in the other set, can be chosen so as to satisfy not 
only the conditions 

Sx i = Sxz = = Sx 2 — 0 

(15) Sxt = Sx\ = Sx? = Sx? = 1 
/SxiXij = 0 = Sx[x [, 

but also the further conditions 

(16) SX 1 X 2 = 0 = Sxix[ . 

This means that, for all the purposes in view, the two sets of predictors can be 
characterized as to their mutual relationships by the values of the remaining 
two sums of products, namely 

Cl = Sxix[ , Ci = Sxix'i . 

In view of the conditions assumed earlier, Ci and C 2 are what have been called 
the canonical correlations between the two sets. 

To the sets thus standardized, the predictand y is related in a mannei expressed 
by the population regression coefficients ft and ft of y on the first set, and ft 
and /?2 on the second If we take y as having unit variance m the population, 
the squared multiple correlation coefficients in the two cases will be 

P* = Pi + , P' 2 = Pi 2 + P*- 

The hypothesis to be tested is that p = p'. If ft , ft , fq , 62 denote the sample 
estimates of the regression coefficients, the statistic appropriate for the test 
would appear necessarily to be proportional to 

w = £(bi + fh — — 
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The sample regression coefficients are normally distributed, with population 
correlations equal to the sample correlations among the corresponding predictors. 
The variance of each is <r 2 . Thus their jomt distribution may be written down 
at once, in a rather simple form in view of (15) and (16). From this it is pos¬ 
sible to determine directly the characteristic function M(t) — Ee iw of w. If 
we write K(t) = log M(t) we obtain - 

2 K(t) = 2(03? - 2c,ftft' + ft' 2 )f 2 + (/3 2 - (1 - (1 - c 2 )^}- 1 

- Slog {1 - (1 - c 2 )f 2 }. 

Here the summations are with respect to j over the values 1 and 2. If each set 
of predictors had had s members, the same result would hold for K(i) except 
that the summations with respect to j would then extend from 1 to s. 

This is a very disappointing result because it contains so many parameters 
The distribution of w must contain the same parameters as its characteristic 
function. All the four parameters ft, fi 1 , appear in the expression above, though 
their effective number is reduced to three by the condition that the two sums 
of squares shall be equal which constitutes the hypothesis under test. The 
distribution of w thus contains at least three unknown parameters besides a. 

The estimate of variance s 2 obtained from the residuals from the grand re¬ 
gression equation of y on Xi, x 2 , x [, and z* is independent of w. Its distribu¬ 
tion is of the usual form and involves a parameter, the population variance, 
which is a function of ft , ft , 0i, and & . We could therefore pass by a single 
integration from the distribution of in to that of the statistic w/s 2 , which vanishes 
with w, and which on this account, and on grounds of physical dimensionality, 
might be considered appropriate to test the hypothesis that p = p'. The ques¬ 
tion may be raised whether the distribution of this ratio might not be free from 
parameters. The answer unfortunately is in the negative, as appears from an 
examination of the characteristic function of the ratio, Even in the simplified 
case in which all the c, are equal, a troublesome parameter persists in the 
distribution. 

Thus we meet again the problem of nuisance parameters, and this time no 
escape is visible. Perhaps some such artifice as those enumerated in paragraph 
3 (for example, some further limitation of the sub-population within which we 
should seek the place of our particular sample) is capable of yielding an exact, 
or “studentized” distribution, but this has not yet been found The problem 
is of considerable interest, not only because of its practical importance, but 
because of its suggestiveness in connection with general theory. 

Numerous other problems having both practical importance and general 
theoretical interest are associated with the selection of predictors. For example, 
we have not dealt at all with the problem of the number of predictors that 
should be used when maximum accuracy in prediction, or in evaluation of the 
regression coefficients, is the sole criterion. A particular case is the determina¬ 
tion of the degree of the regression polynomial which should be fitted to obtain 
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maximum accuracy, for example of the number of orthogonal polynomials in 
fitting a trend Such customary criteria as minimizing the estimated variance 
of deviations, m which the sum of squares which is the numerator and the 
number of degrees of freedom which is the denominator both diminish to zero 
as the number of variates is increased, do not rest upon any satisfactory general 
theory. 

Another related set of problems is concerned with variates more numerous 
than the observations on each. It is clear that there is real information in¬ 
herent in data of this kind, but existing theory and methods, including those of 
the present paper, are not adequate to utilize it in a thoroughly efficient manner. 
A recent paper of P. L. Hsu [9] is unique in not excluding the case m which the 
variates outnumber the observations. 

8. Summary. A criterion has been obtained for judging the definiteness of 
the selection of a particular variate, from among several available for prediction, 
on the basis of its having the maximum sample correlation with the predictand. 
A variation of this criterion is applied in paragraph 6 to the problem of extending 
the list of variates to be used in a regression formula 
Some of the problems of “nuisance parameters” which affect general theory 
are illustrated in this problem. Some outstanding unsolved problems related 
to these questions are discussed in paragraph 7. 
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THE FITTING OF STRAIGHT LINES IF BOTH VARIABLES ARE 
SUBJECT TO ERROR 

By Abraham Wald 

1, Introduction. The problem of fitting straight lines if both variables x 
and y are subject to error, has been treated by many authors. If we have N > 2 
observed points (x % , y t ) (i = 1, ■ ,N), the usually employed method of least 

squares for determining the coefficients a, b, of the straight line y - ax + 6 
is that of choosing values of a and b which minimize the sum of the squares of 
the residuals of the y’s, i e. 2(ax t -f b - y t f is a minimum. It is well knotvn 
that treating y as an independent variable and minimizing the sum of the 
squares of the residuals of the x's, we get a different straight line as best fit. It 
has been pointed out 1 that if both variables are subject to error there is no 
reason to prefer one of the regression lines described above to the other For 
obtaining the "best fit,” which is not necessarily equal to one of the two lines 
mentioned, new criteria have to be found. This problem was treated by R. J. 
Adcock as early as 1877. 2 

He defines the line of best fit as the one for which the sum of the squares of 
the normal deviates of the N observed points from the line becomes a minimum. 
(Another early attempt to solve this problem by minimizing the sum of squares 
of the normal deviates was made by Karl Pearson. 3 ) 

Many objections can be raised against this method First, there is no justifi¬ 
cation for minimizing the sum of the squares of the normal deviates, and not 
the deviations in some other direction. Second, the straight line obtained by 
that method is not invariant under transformation of the coordinate system. 
It is clear that a satisfactory method should give results which do not depend 
on the choice of a particular coordinate system. This point has been empha¬ 
sized by C. F. Roos. He gives 4 a good summary of the different methods and 
then proposes a general formula for fitting lines (and planes in case of more than 
two variables) which do not depend on the choice of the coordinate system. 


1 See for instance Henry Schultz’ "The Statistical Law of Demand/' Jour, of Political 
Economy, Vol. 33, Dec, (1925) 

2 Analyst, Vol, IV, p 183 and Vol, V, p. 53, 

3 "On Lines and Planes of Closest Fit to Systems of Points in Space" Phil. Mag. 6th 
Ser Vol, II (1901) 

4 "A General Invariant Criterion of Fit for Lines and Planes where all Variates are 
Subject to Error," Metron, February 1937. See also Oppenheim and Roos Bulletin of the 
American Mathematical Society, Vol. 34 (1928), pp. 140-141. 
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Roos’ formula includes many previous solutions 6 as special cases. H. E. Jones 6 
gives an interesting geometric interpretation of Roos’ general formula. 

It is a common feature of Roos’ general formula and of all other methods 
proposed in recent years that the fitted straight line cannot be determined 
without a priori assumptions (independent of the observations) regarding the 
weights of the errors m the variables x and y. That is to say, either the standard 
deviations of the errors in x and in y are involved (or at least their ratio is 
included) in the formula of the fitted straight line and there is no method given 
by which those standard deviations can be estimated by means of the observed 
values of x and y 

R. Frisch 7 has developed a new general theory of linear regression analysis, 
when all variables are subject to error. His very interesting theory employs 
quite new methods and is not based on probability concepts Also on the basis 
of Frisch’s discussion it seems that there is no way of determining the “true” 
regression without a priori assumptions about the disturbing intensities. 

T. Koopmans 8 combined Frisch’s regression theory with the classical one in 
a new general theory based on probability concepts. Also, according to his 
theory, the regression line can be determined only if the ratio of the standard 
deviations of the errors is known 

In a recent paper R. G. D. Allen 8 gives a new interesting method for deter¬ 
mining the fitted straight line in case of two variables x and y. Denoting by a, 
the standard deviation of the errors in x, by cr, the standard deviation of the 
errors in y and by p the correlation coefficient between the errors in the two 
variables, Allen emphasizes (p. 194) 9 that the fitted line can be determined only 
if the values of two of the three quantities <r,, cr,, p are given a priori. 

Finally I should like to mention a paper by C Eisenhart, 10 which contains 
many interesting remarks related to the subject treated here. 

In the present paper I shall deal with the case of two variables x and y in 
which the errors are uncorrelated. It will be shown that under certain con¬ 
ditions . 

(1) The fitted straight line can be determined without making a priori assump¬ 
tions (independent of the observed values x and y) regarding the standard 
deviations of the errors. 

(2) The standard deviation of the errors can be well estimated by means of 

s For instance also Corrado Gini's method described in his paper, “Sull’ Interpolazione 
di una Retta Quando i Valon della Vanable Independente sono Affecti da Erron Acciden- 
talis,” Matron, Vol. I, No 3 (1921), pp 63-82, 

0 "Some Geometrical Considerations in the General Theory of Fitting Lines and Planes,” 
Matron, February 1937. 

7 Statistical Confluence Analysis by Means of Complete Regression Systems , Oslo, 1934 

8 Linear Regression Analysis of Economic Time Series, Haarlem, 1937 

9 “The Assumptions of Linear Regression," Economica, May 1939 

10 “The interpretation of certain regression methods and their use in biological and 
industrial research,” Annals of Math Slat, Vol. 10 (1939), pp. 162-186, 
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the observed values of x and y. The precision of the estimate increases with 
the number of the observations and would give the exact values if the number 
of observations were infinite. (See in this connection also condition V in 
section 3.) 

2. Formulation of the Problem. Let us begin with a precise formulation of 
the problem. We consider two sets of random variables 11 

x ]) * ■ , x n ; yi i • ■ • , Vn • 

Denote the expected value E(xf) of x, by X, and the expected value E(y t ) of 
y x by Y, (i = 1, • ■ ■ , N). We shall call X, the true value of x t ,. Y, the true 
value of y t ,x t — X, = e, the error m the i-th term of the z-set, and y, — Y, = y, 
the error in the f-th term of the y- set. 

The following assumptions will be made: 

I. The random variables ei, • ■ , t N each have the same distribution and they 
are uncorrelated, i e. E(t,e,) — 0 for i ^ j. The variance of e, is finite. 

II The random variables rji, • , vn each have the same distribution and are 

uncorrelated, i.e. -E(mj) = 0 f or » ^ j. The variance of m is finite. 

III. The random variables c, and y, (i = I, • ,N;j=l, • • , N) are un¬ 
correlated, i.e. f?(e, i},) = 0 . 

IV. A single linear relation holds between the true values X and Y, that is to 
say Y, ~ aX, + p (i - 1, • , N). 

Denote by e a random variable having the same probability distribution as 
possessed by each of the random variables ei, ■ • • , e#, and by 17 a random 
variable having the same distribution as rp , • • , . 

The problem to be solved can be formulated as follows: 

We know only two sets of observations: x[, • • • , ; y[, ... ,y' N , where x[ 

denotes the observed value of x t and y t denotes the observed value of y t . We 
know neither the true values Xi, ■ ■ ,X N Y\, • • ■ ,Y N , nor the coefficients 
a and p of the linear relation between them. We have to estimate by means 
of the observations x[, , x N ; y[, ■ , y' N , ( 1 ) the values of a and p, ( 2 ) the 

standard deviation a t of t, and (3) the standard deviation er, of jj. 

Problems of this kind occur often in Economics, where we are dealing with 
time series. For example, denote by x, the price of a certain good G in the 
period U , and by y, the quantity of G demanded in t,. In each time period t, 
there exists a normal price X , and a normal demand F, which would obtain if 
the influence of some accidental disturbances could be eliminated If we have 
reason to assume that there exists between the normal price and the normal 
demand a linear relationship we have to deal with a problem of the kind de 
scribed above. 

In the following discussions we shall use the notations x t and y; also for their 


11 A random or stochastic variable is a real variable associated with a probability 
distribution. 
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observed, values and y l since it will be clear in which sense they are meant 
and no confusion can arise. 


3. Consistent Estimates of the Parameters a, p, a,, tr,. For the sake of 
simplicity we assume that N is even, We consider the expression 


( 1 ) 

where m = N/2. 
( 2 ) a = 


(*1 + ■ • 

■•+*».)- (s™+i + ■ • • 

+ , 


N 

J 

(yi + ■ 

■ • + Vm) ~ (Vm+l + 1 ' ' 

• + Hn) 


N 

7 

As an estimate of a we shall use the expression 

fl2 _ (2/1 + 

• • • + Vm) — (Vm +1 + 

■ • • + Vn) 


fli (*!+■••+ x m ) — (a; m+ i + • • - + xh) 


We make the assumption 
V. The limit inferior of 

(Xi + • ■ • + XJ) — Um+l + • ■ • + Xn) 
N 


(N = 2 , 3, • • • ad. inf. 


is positive. 

We shall prove that a is a consistent estimate of a, i.e. a converges stochas¬ 
tically to a with N —» «>, if the assumptions I-V hold. Denote the expected 
value of a x by di and the expected value of <z 2 by o 2 It is obvious that 


(3) 


(Xi -f • • • + X m ) — {Xrn+l + • ■ • + Xff) 
N 

(Yl + • . • + Y m ) ~ (Ym+1 + • • ■ + Yif) 


On account of the condition IY we have 


(4) &2 = aa i, or — = a. 

Si 

The variance of Oi - a x is equal to <r 2 ,/N and the variance of a 5 - d 2 is equal 
to <4 /JV. Hence ai and a 2 converge stochastically towards di and a 2 respectively. 

From that and assumption V it follows that also — converges stochastically 

0 i 

towards = a. The intercept 0 of the regression line will be estimated by 
ai 

, _ . , . xi + • • • + x N __j _ _ yi + • • • + Uh 

(5) b = y — ax, where x = -^- and y -^-• 

Denote by X the arithmetic mean of X x , ■ ■ , X x and by Y the arithmetic 
mean of Y x , ,Y N . Since y Converges stochastically towards f, x towards 
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X, and a towards a, b converges stochastically towards Y — aX. From condi¬ 
tion IV it follows that Y — aX — (3. Hence b converges stochastically 
towards j3. 

Let us introduce the following notations: 


= sample standard deviation of the ^-observations, 


s x = a/',2 == sample standard deviation of the ^-observations, 

s u - i/;2 = sample standard deviation of the ^/-observations, 

s xv = 2 —--— = sample covariance between the x-wt, and y-set. 

s x , Sy and s xy denote the same expressions of the true values Xi, . • , ; 

r 1 ( ... ( y*. 

It is obvious that 


E(sl) = 4 + «rl 


2 . 2 N — 1 


E(si ) = 4 + 


2 , 2 N — 1 
,r + 0,1 ’ 


(8) E( Sxv ) — Sxr, 

where E[sl), E{sl), and Eis^) denote the expected values of s“, si, and s xtt X 
Since F, = aXi + (3, we have 

(9) Sy — ttSx j 

(10) Sxr = otSx . 

From (8), (9) and (10) we get 

/•. i \ „a _ E(sx,/) 


(12) s\ = atf(0- 

If we substitute in (6) and (7) for s| and Sr their values in (11) and (12), 
we get 


*\ = [e(sD - - i), 

= \E(sl) - a E( Sxv )]N/(N - 1). 


18 1 observe that the equations (6), (7) and (8) are essentially the same as those investi¬ 
gated by R, Frisch, Statistical Confluence Analysis pp, 51-52. See also Allen’s equations 
(4) l.c. p. 194 
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Since si, s| , s xv converge stochastically towards their expected values and a 
converges stochastically towards a, the expressions 

(15) [s| - ^]lV/(2V - 1) 

and 

(16) [si - as xv \N/(N - 1) 

are consistent estimates of a\ and respectively. 


4. Confidence Interval for a. In this section, as well as in sections 5 and 6, 
only the assumptions I-IV are assumed to hold. In other words, all state¬ 
ments made in these sections are valid independently of Assumption Y, except 
where the contrary is explicitly stated 
Let us introduce the following notation; 


a _ + • • • + 3 Vi + ■ • • + y m 

xi = -, yi — - 

m m 


Xi — 


Xm+l + ■ • • + X 


_ Vm+l + • • • + Vn 


m 


m 


E ( x > ~ &) 2 + E (®j — Xi ) 2 


:) 


N 


1C (y< - yd 2 + E (y, - fa? 

_ »-l j-m+1 

{Sy) N 


/ 



E (®. - x i)(V' - fa) + E (®/ - ®«) (y, - yi) 

»—l j«m-H 

N 


Xi, Xi, Yi, Yi, ( s'x ) 2 , (sy) 2 and s'xy denote the same functions of the true 
values Xi, ,X N , Ti, • • • , F* . The expressions si, si, and sly are 
slightly different from the corresponding expressions s,, 0 „, and . The 
reason for introducing these new expressions is that the distributions of s x , 

s u , and Sxv are not independent of the slope a = ~ of the sample regression 

(fa 

line, but s' x , s' v and sly are distributed independently from a (assuming that € 
and ij are normally distributed). The latter statement follows easily from the 

fact that according to (1) and (2) a = —- ^ and s' x , si , sly are distributed 

xi — Xi 

independently of ii, , pi and fa . 
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In the same way as we derived (13) and (14), we get 
(13') 0 c = E( S y - ^ } ]lV/(N - 2 ), 


(140 


= [Eistf - aE(s'j)N/(N - 2). 


These formulae differ from the corresponding formulae (13) and (14) only in 
the denominator of the second factor, having there N — 2 instead of IV — 1 . 
This is due to the fact that the estimates s x , s„, s xu are based on N — 1 degrees 
of freedom whereas s' x , s( and are based only on JV — 2 degrees of freedom. 
From (130 and (140 we get the following estimates 13 for <r« and <r 2 : 


(17) 


' T 


U? - S J» N/(N - 2 ), 

a _ 


(18) [(s ') 2 - as'jN/(N - 2). 
Hence we get as an estimate of a\ -)- acr\ the expression. 

s 2 = [(O* + - 2c's' xv ]N/(N - 2 ) 

(19) 


N 


£ [(y, - ax,) - (fa - aX\)f + £ [(Vi — olx,) - (fa - ax 2 )] 2 


N - 21 
Now we shall show that 

( 20 ) 


N 


(N - 2 )s 2 

2 , 2 2 
<T, + a <T, 


has the x 2 -distribution with N — 2 degrees of freedom, provided that e and 17 
are normally distributed. In fact, 


(y, - ax,) - (fa - ax 1 ) = 17 , - at, ~ (ft — ah) 


(1 = 1 , 


and 


(Vi ~ aX i) ~ (fa ~ “^ 2 ) = m - at , - (ft - ah) (j = m + 1 , 
where 


• ,m) 

,N), 


. €l + 

«1 = - 

1 * • + tm 

I 

Cm+l + 1 ‘ 1 
«2 =- 

+ ey 


in 

m 


„ _ Vi + 
f)l 

• • • + Vm 

m ’ 

„ _ Vm+l + ■ • ' 
m 

* + *?Ar 


Since the variance of ft, — at k is equal to <r 2 + aa\ and since 17 * — ae k is un¬ 
correlated with yi — att (k 1) (k, l = 1, • ■ ■ , N), the expression (20) has the 
x 2 -distribution with N — 2 degrees of freedom. 


11 An “estimate” is usually a function of the observations not involving any unknown 
parameters. We designate here as estimates also some functions involving the parameter a. 
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Now we shall show that 

( 21 ) 


y/N ai{a — a) 

\/t i 2~~2 

V + a tr c 


is normally distributed with zero mean and unit variance. In fact from the 
equations (l)-(4) it follows that 

Oi(a - a) = a 2 + — ~~ ~ °i (rrj 


— 02 + 


fll — % 


_ ih — fji 


Oj + 

£l — £ 2 


Since the latter expression is normally distributed (provided that e and n are 


normally distributed) with zero mean and variance 


+ 


2 2 
a <t. 


N 


our statement 


about (21) is proved. 

Obviously (20) and (21) are independently distributed, hence s/N — 2 times 
the ratio of (21) to the square root of (20), namely, 


( 22 ) 


t = -/jyzrjj fll ( q ~ °d _ Qi(o - a)VN- 2 

VN - 2 s ' V (s') 2 + a 2 (s'f - 2a,C 


has the Student distribution with N — 2 degrees of freedom. Denote by t 0 the' 
critical value of t corresponding to a chosen probability level. The deviation 
of a from an assumed population value a is significant if 


ai(a — ajy/'N — 2 
V {s'yf + a 2 (si) 2 - 2as' IV 


The confidence interval for a can be obtained by solving the equation in a, 
(23) a\(a - a) 2 = [(si) 2 + a 2 (si) 2 - 2asiy] ^3—^ 1 


Now we shall show that if the relation 
(04') > (« 

(24) at > 


holds, the roots a L and a 2 are real and a is contained in the interior of the interval 
[aioj]. From (19) it follows that 

(si) 2 + a (si) 1 — 2asi„ > 0 

for all values of a. Hence, for a — a the left hand side of (23) is smaller than 
the right hand side On account of (24) there exists a value a' > a and a 
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value a" < a such that the left hand side of (23) is greater than the right hand 
side for a = a' and a = a". Hence one ro,ot must lie between a and a' and the 
other root between a" and a This proves our statement. The relation (24) 
always holds for sufficiently large N if Assumption Y is fulfilled. The confi¬ 
dence interval of a is the interval [a \, a 2 ] For very small N (24) may not hold. 

Finally I should like to remark that no essentially better estimate of the 
variance of t\ — at can be given than the expression s 2 in (19). In fact, we 
have 2 N observations X\, • • , x N ; y\, • ■ •, y N ■ For the estimation of the 
variance of y — at we must eliminate the unknowns X x , ■ , X N and 0. (The 

unknowns Yi , ■ ■ ■ , Y N are determined by the relations Y r = aX, + 0 and a is 
involved in the expression whose variance is to be determined.) Hence we have 
at most N — 1 degrees of freedom and the estimate in (19) is based on N — 2 
degrees of freedom. 

6. Confidence Interval for 0 if a is Given, In this case the best estimate of 0 
is given by the expression: 


, - _ i - 3i + ■ • • + Xu , _ yi + ■ ■ • + yN 

o a = y — ax where x = -^-and y — -^- 

We have 

- 0 = ~ F) - a(x — 2) - fj - at 

where 

ei 4- • • < + t N , in + • • • + tin 

- - v -,and,- v -. 


VN ( 6 . - 0) 

-/ J I 22 

V<r, + a <r t 

is normally distributed with zero mean and unit variance It is obvio us that 
the expressions (20) and (25) are independently distributed. Hence \N — 2 
times the ratio of (25) to the square root of (20), i.e. 

, = ,/WYT q (*>» - 0) = (b a - 0) 

V¥=2s + a (sH,) 2 - 2«4 

has the Student distribution with N — 2 degrees of freedom. Denoting by t 0 
the critical value of t according to the chosen probability level, the confidence 
interval for 0 is given by the interval: 

j, _L a / (sy) 2 + a(s' x ) 2 — 2 as'xy , ^ (s' v ) 2 + a 2 (s' I ) 2 — 2as' xv t 

I VN - 2 VN - 2 I 


Hence, 

(25) 
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6. Confidence Region for a and j3 Jointly. In most practical cases we want to 
know confidence limits for a and j9 jointly. A pair of values a, /? can be repre¬ 
sented in the plane by the point with the coordinates a, 0 A region R of this 
plane is called confidence region of the true point (a, /3) corresponding to the 
probability level P if the following two conditions are fulfilled. 

(1) The region R is a function of the observations Xi, ■ ■ ,x K \yt, ■ ,y N , 
i.e. it is uniquely determined by the observations. 

(2) Before performing the experiment the probability that we shall obtain 
observed values such that (a, (3) will be contained in R, is exactly equal to P. 
P is usually chosen to be equal to .95 or .99 

We have shown that the expressions (21) and (25), i.e. 

VN_ ai(a - a) s/N (b a - 0) 

+ a a\ a/ a\ + a a\ 

are normally distributed with zero mean and unit variance. Now we shall 
show that these two quantities are independently distributed For this purpose 
we have only to show that x, y, ai and (h are independently distributed (a x and a 2 
are defined in (1)), but since 

— -E(oi) — (ii — «s)/2 
<h - E(<h) = (fj x - fj s )/2 
x — E(x) = i 
V ~ E(y) = fj, 

we have only to show that l, fj, ^ — h, fji — fy are independently distributed. 
We obviously have 

. _ «i + 

It is evident that «i, h, ifi and are independently distributed. Hence, 
— € 2 )] = (Eel — Ei 1)/2 = 0 and also E[fj(fjt — fy)] = (Efj\ — Et, yl)/2 = 0. 
Since <1 — « 2 , fh — fit, and £ and fj are normally distributed, the independence 
of this set of variables is proved, and therefore also (21) and (25) are inde¬ 
pendently distributed. It is obvious that the expression (20) is distributed 
independently of (21) and (25). From this it follows that 

N — 2 N[a\(a — a) 2 + (y — ax — 0) 2 ] 

. . 2 (N - 2)s a 

(26) 

_ (N — 2)[ffli(a — a) 2 -f (y — ax — 0) 2 ] 

2 [(s') 2 + a 2 (s' x ) 2 ~ 2as'j 

has the P-distribution (analysis of variance distribution) with 2 and N — 2 
degrees of freedom. The P-distribution is tabulated in Snedecor’s book: Calcu- 
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lation and Interpretation of Analysis of Variance, Collegiate Press, Ames, Iowa 
1934. The distribution of \ log F = z is tabulated in R. A. Fisher’s book! 
Statistical Methods for Research Workers, London, 1936. Denote by P tl the 
critical value of F corresponding to the chosen probability level P. Then the 
confidence region R is the set of points (tv, 0) which satisfy the inequality 


(27) 


N - 2 al(a - a) 2 + (y - ax - /3) 2 ^ 
2 ‘ (s') 2 + a 2 U) 2 - 2*4 


The boundary of the region is given by the equation 

(28) al(a - a) 2 + (y — ax — 8) 2 = t(^) 5 + ^(sf) 2 — 2as xv \. 

This is the equation of an ellipse. Hence the region R is the interior of the 
ellipse defined by the equation (28). If Assumption V holds, the length of the ‘ 
axes of the ellipse are of the order 1 /\/N, hence with increasing N the ellipse 
reduces to a point. 


7. The Grouping of the Observations. We have divided the observations in 
two equal groups Gi and (h , Gi containing the first half (x t , yfj, ■ , (x ,„, y m ) 
and Gi the second half (® m+ i, y m +i), ■ ,{x K , Vn) of the observations. All 
the formulas and statements of the-previous sections remain exactly valid for 
any arbitrary subdivision of the observations in two equal groups, provided 
that the subdivision is defined independently of the errors si, • , t N ; 

i)i, ■ • • , yir . The question of which is the most advantageous grouping arises, 
i.e. for which grouping will a be the most efficient estimate of a (will lead to 
the shortest confidence interval for a) It is easy to see that the greater | ai | 
the more efficient is the estimate a of a. The expression | ai | becomes a maxi¬ 
mum if we order the observations such that xi < xs < ■ • < x N . That is to 
say | «i | becomes a maximum if we group the observations according to the 
following: 

Rule I. The point (x ,, y,) belongs to the group Gi if the number of elements 
x, (j ^ i) of the senes xi , • ■ • , x N for which x , < x , is less than m = N/ 2. The 
point ( x t , yf) belongs to Gi if the number of elements x , (j ^ i ) for which x, < x, 
is greater than or equal to m. 

This grouping, however, depends on the observed values Xi, ■ ■ • ,x N and is 
therefore in general not entirely independent of the errors t\, ■ ■ ■ , (n • Let us 
now consider the grouping according to the following: 

Rule II. The point (x,, y t ) belongs to the group G\ if the number of elements 
Xj of the series Xi , ■ ■ ,X N for which X, < X, (j ^ i) is less than m. The 
point ( x ,, y,) belongs to Gi if the number of elements Xjfor which X , < X, (j i) 
is equal to or greater than m. 
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The grouping according to Rule II is entirely independent of the errors 
ft, ■ , j Vi i • . V,v It is identical with the grouping according to Rule I 
in the following case: Denote by x the median of x t , ■ ■ , x N ; assume that e 
can take values only within the finite interval [-c, +c] and that all the values 
xi, • • , x N fall outside the interval [x — c, x + c] It is easy to see that in 
this case x, < x (i = 1, ■ • • , N) holds if and only if X, < X, where X denotes 
the median of Xi , ■ ■ ■ , X N , Hence the grouping according to Rule II is 
identical to that according to Rule I and therefore the grouping according to 
Rule I is independent of the errors «i, ■ ■ ■ , . In such cases we get the best 

estimate of a by grouping the observations according to Rule I. Practically, 
we can use the grouping according to Rule I and regard it as independent of the 
errors «i, ■ • • , ; »?i, • • , if there exists a positive value c for which the 
probability that | < | > c is negligibly small and the number of observations 
contained in [x — c, x + c] is also very small. 

Denote by a' the value of a which we obtain by grouping the observations 
according to Rule I and by a" the Value of a if we group the observations 
according to Rule II The value a" is in general unknown, since the values 
Xi, • ■ , X y are unknown, except m the special case considered above, when 
we have a" = a'. We will now show that an upper and a lower limit for a" 
can always be given. First, we have to determine a positive value c such that 
the probability that | e | > c is negligibly small. The value of c may often be 
determined before we make the observations having some a prion knowledge 
about the possible range of the errors If this is not the case, we can estimate 
the value of c from the data It is well known that if we have errors in both 
variables and fit a straight line by the method of least squares minimizing in 
the ^-direction, the sum of the squared deviations divided by the number of 
degrees of freedom will overestimate a] . Hence, if e is normally distributed, 
we can consider the interval [—3w, 3t>] as the possible range of e, i.e. c = 3u, 
where v denotes the sum of the squared residuals divided by the number of 
degrees of freedom. If the distribution of e is unknown, we shall have to take 
for c a somewhat larger value, for instance c = 5v. After having determined c, 
upper and lower limits for a" can be given as follows: we consider the system $ 
of all possible groupings satisfying the conditions: 

(1) If x, < x — c the point ( x t , y % ) belongs to the group (h. 

(2) If x l > x + c the point (x,, y t ) belongs to the group G 2 . 

We calculate the value of a according to each grouping of the system S and 
denote the minimum of these values by o*, and the maximum by a**. Since 
the grouping according to Rule II is contained in the system S, a * is a lower 
and a** an upper limit of a". 

Let g be a grouping contained in S and denote by I 0 the confidence interval 
for a which we obtain from formula (23) using the grouping g. Denote further 
by I the smallest interval which contains the intervals /„ for all elements g 
of S, Then I contains also the confidence interval corresponding to the grouping 
according to Rule II. If we denote by P the chosen probability level (say 
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P = .95), then we can say: If we were to draw a sample consisting of N pairs 
of observations (.'Ci , yi), ■ , (x N , y N ), the probability is greater than or equal 
to P that we shall obtain a system of observations such that the interval I will 
include the true slope a. 

The computing work for the determination of I may be considerable if the 
number of observations within the interval [x — c, x + c] is not small. We 
can get a good approximation to I by less computation work as follows: First 
we calculate the slope a 1 using the grouping according to Rule I and determine 
the confidence interval [ a' - 8, a 1 + A] according to formula (23). Denote by 

a(g ) the value of the slope, i.e. the value of ^ — ~ , corresponding to a grouping 

CC l CCz 

q of the system S, and by [a(g) - S g , a(g) + A 0 ] the corresponding confidence 
interval calculated from (23). -Neglecting the differences (5„ — 5) and (A„ — A), 
we obtain for I the interval [a* — 5, a** + A]. 

If the difference a** — a* is small, we can consider 1 — [a* — 5, a** + A] as 
the correct confidence interval of a corresponding to the chosen probability 
level P. If, however, a** — a* is large, the interval / is unnecessarily large. 
In such cases we may get a much shorter confidence interval by using some 
other grouping defined independently of the errors , • ■ , e* ; vi, ■ ■ • , vv. 
For instance if we see that the values ) ■ ■ ■ , x N considered in the order as 
they have been observed, show a monotonically increasing (or decreasing) tend¬ 
ency, we shall define the group Gi as the first half, and the group G 2 as the 
second half of the observations. Though we decide to make this grouping after 
having observed that the values Xi , • ■ • , x N show a clear trend, the grouping 
can be considered as independent of the errors e x , • ■ , e N . In fact, if the 
range of the error e is small in comparison to the true part X, the trend tendency 
of the value Xi, ■ ■ ■ ,x N will not be affected by the size of the errors ei, • • • , c, v . 
We may use for the grouping also any other property of the data which is 
independent of the errors. 

The results of the preceding considerations can be summarized as follows: 
We use first the grouping according to Rule I, calculate the slope o' = --— 

X\ — Xi 

and the corresponding confidence interval [o' - 8, a' + A] (formula (23)). This 
confidence interval cannot be considered as exact since the grouping according 
to Rule I is not completely independent of the errors. In order to take account 
of this fact, we calculate a* and a**. If a ** — a* is small, we consider I = 
[a* — 8, a** -f- A] with practical approximation as the correct confidence interval. 
If, however, a** — a* is large, the interval I is unnecessarily large. We can 
only say that 7 is a confidence interval corresponding to a probability level 
greater than or equal to the chosen one. In such cases we should try to use 
some other grouping defined independently of the errors, which eventually will 
lead to a considerably shorter confidence interval. 

Analogous considerations hold regarding the joint confidence region for cl 
and /3. We use the grouping according to Rule I and calculate from (27) the 
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corresponding confidence region R. If | a™ — a* | and | b** — h* | are small 
(b* = y — a>*x and &** = y — a^x) we enlarge R to a region R corresponding 
to the fact that a and b may take any values within the intervals [a**, a*] and 
[>b**, 6*] respectively. The region R can be considered with practical approxi¬ 
mation as the correct confidence region. If | a** — a* | or | b** — b* | is large, 
we may try some other grouping defined independently of the errors, which 
may lead to a smaller confidence region. In any case R represents a confidence 
region corresponding to a probability level greater than or equal to the 
chosen one. 

8. Some Remarks on the Consistency of the Estimates of a, p, a,, o-,. We 

have shown in section 3 that the given estimates of a, p, cr, and <t„ are consistent 
if condition Y is satisfied 

If the values x\ , ■ • , x N are not obtained by random sampling, it will in 
general be possible to define a grouping which is independent of the errors and 
for which condition V is satisfied. We can sometimes arrange the experiments 
such that no values of the senes xi, • , Xy should be within the interval 

[x — c, x + c] where x denotes the median of x x , • ■ ,x N and c the range of 
the error e. In such cases, as we saw, the grouping according to Rule I is 
independent of the errors. Condition V is certainly satisfied if we group the 
data according to Rule I. 

Let us now consider the case that Xi, ■ ■ , Xy are random variables inde¬ 
pendently distributed, each having the same distribution. Denote by X a 
random variable having the same probability distribution as possessed by each 
of the random variables Xi, • , Xy . Assuming that X has a finite second 

moment, the expression in condition Y will approach zero stochastically with 
N —> oo for any grouping defined independently of the values Xi, ■ • ■ , Xy . 
It is possible, however, to define a grouping independent of the errors (but not 
independent of Xi, • , Xy) for which the expression in V does not approach 
zero, provided that X has the following property: There exists a real value X 
such that the probability that X will lie within the interval [X — c, X + c] 
(c denotes the range of the error e) is zero, the probability that X > X + c 
is positive, and the probability that X < X — c is positive The grouping can 
be defined, for instance, as follows: 

The t-th observation (x,, y,) belongs to the group Gi if x, < X and to Gi if 
x, > ,X. We continue the grouping according to this rule up to a value i for 
which one of the groups Gi, G 2 contains already N/2 elements. All further ob¬ 
servations belong to the other group. 

It is easy to see that the probability is equal to 1 that the relation xv < X 
is equivalent to the relation X, < X — c and the relation x, > X is equivalent to 
the relation X, > X + c. Hence this grouping is independent of the errors. 
Since for this grouping condition V is satisfied, our statement is proved. 

If X has not the property described above, it may happen that for every 
grouping defined independently of the errors, the expression in condition V con* 



298 


ABRAHAM WALD 


verges always to zero stochastically. Such a case arises for instance if X, t and 
17 are normally distributed . 14 It can be shojwn that in this case no consistent 
estimates of the parameters a and 0 can be given, unless we have some addi¬ 
tional information not contained in the data (for instance we know a priori the 
ratio ff./o-,). 

9. Structural Relationship and Prediction. 16 The problem discussed in this 
paper was the question as to how to estimate the relationship between the true 
parts X and F. We shall call the relationship between the true parts the struc¬ 
tural relationship. The problem of finding the structural relationship must not 
be confused with the problem of prediction of one variable by means of the 
other. The problem of prediction can be formulated as follows: We have ob¬ 
served N pairs of values (xi , 2 / 1 ), • • • , {x N , Vn)- A new observation on x is 
given and we have to estimate the corresponding value of y by means of our 
previous observations (xi, yi), • ■ • , {x N , y N ). One might think that if we have 
estimated the structural relationship between X and F, we may estimate y by 
the same relationship. That is to say, if the estimated structural relationship 
is given by F = aX + 6 , we may estimate y from x by the same formula: 
y = ax + h. This procedure may lead, however, to a biased estimate of y. 
This is, for instance, the case if X, t and 17 are normally distributed. It can 
easily be shown in this case that for any given x the conditional expectation of 
y is a linear function of x, that the slope of this function is different from the 
slope of the structural relationship, and that among all unbiased estimates of 
y which are linear functions of x, the estimate obtained by the method of least 
squares has the smallest'variance. Hence in this case we have to use the least 
square estimate for purposes of prediction. Even if we would know exactly the 
structural relationship F = aX + 73 , we would get a biased estimate of y by 
putting y = ax + 0 . 

Let us consider now the following example: X is a random variable having 
a rectangular distribution with the range [0, 1 ]. The random variable e has a 
rectangular distribution with the range [—0.1, + 0.1]. For any given x let us 
denote the conditional expectation of y by E(y | x) and the conditional expecta¬ 
tion of X by E{X | x). Then we obviously have 

E(y I x) = <xE{X | *) + 0. 

Now let us calculate E(X | x). It is obvious that the joint distribution of X and 
«is given by the density function: 


5 dXdf, 


14 1 wish to thank Professor Hotelling for drawing ray attention to this case. 

16 1 should like to express my thanks to Professor Hotelling for many interesting sug¬ 
gestions and remarks on this subject. 
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where X can. take any value within the interval [0, 1] and e can take any value 
within [—0.1, + 0.1]. From this we obtain easily that the joint distribution of 
x and X is given by the density function 


5 dx dX, 


where x can take any value within the interval [—0.1,1.1] and X can take any 
value lying in both intervals [0, 1] and [z - 0.1, x + 0.1] simultaneously. De¬ 
note by l x the common part of these two intervals. Then for any fixed x the 
relative distribution of X is given by the probability density 

dX 
f dX 


Hence, we have 


E(X\x) = 



We have to consider 3 cases: 

(1) 0.1 < x < 0.9. 

In this case 7* = [x — 0.1, x + 0.1] and 


E(X\x) = 


f x+0 1 

XdX 

Ji-0 1 


/■I+O I 

X— 0.1 


= X. 


dX 


(2) -0 1 < x < 0.1. Then I x = [0, * + 0.1] and 

rH"0 1 


ECX\x) - 


f 

j 0 


XdX 


f. 


, 1 + 0,1 


= .5x + .05. 


dX 


(3) 0.9 < * < 11. Then I x = [x- 0.1,1] and 

r xdx 

E(X\x) = ^- = .Si + .45. 


S' 


dX 
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Since 

E[y | x) = aE[X | x) + ft 

we see that the structural relationship gives an unbiased prediction of y from x 
if 0.1 < x < 0,9, but not in the other cases. 

The problem of cases for which the structural relationship is appropriate also 
for purposes of prediction, needs further investigation. I should like to mention 
a class of cases where the structural relationship has to be used also for prediction. 
Assume that we have observed JV values (^i, j/i), • - •, (x x , y n ) of the variables 
x and y for which the conditions I-IV of section 2 hold. Then we make a new 
observation on x obtaining the value We assume that the last observation 
on x has been made under changed conditions such that we are sure that x' does 
not contain error, i e. x’ is equal to the true part X 1 . Such a situation may arise 
for instance if the error t is due to errors of measurement and the last observa¬ 
tion has been made with an instrument of great precision for which the error of 
measurement can be neglected. In such cases the prediction of the correspond¬ 
ing y' has to be made by means of the estimated structural relationship, i.e. we 
have to put y' = ax' -f -1 

The knowledge of the structural relationship is essential for constructing any 
theory in the empirical sciences. The laws of the empirical sciences mostly 
express relationships among a limited number of variables which would prevail 
exactly if the disturbing influence of a great number of other variables could 
be eliminated. In our experiments we never succeed in eliminating completely 
these disturbances. Hence in deducing laws from observations, we have the 
task of estimating structural relationships. 

Columbia University, 

New York, N. Y. 



A METHOD FOR MINIMIZING THE SUM OF ABSOLUTE VALUES 

OF DEVIATIONS 

By Robert R. Singleton 

1. Introduction. In the Philosophical Magazine, 7th series, May 1930, E. C. 
Rhodes described a method of computation for the estimation of parameters 
by minimizing the sum of absolute values of deviations. His is an iterative 
and recursive method, m the following sense There is a direct method for 
minimization with one parameter. Assuming a method for minimization with 
n — 1 parameters, Rhodes nnposes a relation between the n parameters (in an 
n-parameter problem) and finds a restricted minimum by the method for n — 1 
parameters. In this sense his method is recursive. He then repeats the process, 
by imposing on the n parameters a new relation determined by the restricted 
minimum. In this sense his method is iterative The process is finite, ending 
when a restricted minimum immediately succeeds itself, indicating a tiue 
minimum. 

Rhodes' paper presents the method without proof. The purpose of the 
present paper is to analyze the situation in detail sufficient to mdicate proofs 
for various methods, and to present a new method which reduces the labor of 
solution by eliminating the recursive feature The iterative approach is re¬ 
tained The solution of Rhodes’ illustrative problem will be given for com¬ 
parison between the two methods 

The paper uses geometric terminology and develops to quite an extent the 
geometry of a surface representing the summed absolute deviations. This 
seems the clearest means of presenting the relationships Further analysis of 
the properties of this surface should lead to an even more direct method for 
attaining the minimum than the one here presented. 

In the writing of the paper, no attention has been given to sets of observa¬ 
tions or equations among which a linear dependence may exist. In practice, 
such a situation almost never occurs. If the need arises, the adjustments 
which must be made to take care of dependence are in each case fairly obvious 

2. Geometric Analogue of Summed Absolute Deviations. Let n observa¬ 
tions on v + 1 variates be represented by x\ , y' where i = 1 , , n; a = 

1, , v. Unless otherwise noted, latin indices have range 1 to n, greek indices, 

1 to v The summation convention of tensor analysis is used. 

1 The variates are to be statistically related by the linear function 1 

__ f = ®U°, 

1 This includes the linear function with a constant, since a variate x l = 1 may be used. 
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y being an estimate of y\ u“ are to be determined so that v — 2 { | y' — y' i 
is a minimum. Set 

(1) v' = — y' 

and determine functions e\u a ) so that eV > 0, and | e | = 1. It is immaterial 
that e is not uniquely determined when «“ satisfies v' = 0. Then v = 2tfV 
is to be minimized. Using (1), 

(2) v = x a u a — y 
where 

x a = V = 2,eV. 

Consider a Euclidean ( v + l)-space, E y+l , with coordinates u , • ■ ■ , u, v. 
The coordinate hyperplane perpendicular to the w-axis will be called E,. In 
E r +1 each of equations (1) for a particular i represents a r-plane which intersects 
E„ in a (v — l)-plane when v' = 0. Each of the equations 

'3) v' = e\x' a u a - y') 

represents two half-planes which touch E, and each other along the (v — 1)- 
plane given in E, by the equation 

(4) x { a u a — y' = 0. 

The functions on the right-hand side of (3) are thus continuous everywhere, 
and linear in any neighborhood of E , none of whose points satisfies (4). Since 
a sum of functions continuous and linear in a neighborhood is also continuous 
and linear in that neighborhood, it follows that the function on the right in (2) 
is continuous for all u, and linear for every neighborhood of E„ containing no 
points which satisfy (4) for any i. Hence 
Observation I: The surface ( S) given m E t+ \ by (2) consists of portions of 
v-planes joined together. The projection of these joins on E y forms a network of 
{v — 1 )-planes determined in E y by equations (4). 

3. Existence of a Minimum. Define a “bend of degree r on S” to be the 
locus of all points on S whose w-coordinates satisfy a set of r independent 
equations of (4). To each set of r independent equations corresponds a unique 
bend of degree r. 

If a linear relation u a = o“X c + b a , <r = l, ■ ■ ■ , p < v, rank (a“) = y, is 
imposed on n“, all the preceding development, reduced in dimension, applies 
to the new variates x\a? , y' — x' a b a . 

Observation II: A section of S by a plane of any dimension d < v has all 
the properties of an S-surface of dimension d. 

Since any set of consistent equations selected from (4) determines such a 
linear relation for u a , the application of Observation I to any of the bends of S 
shows that each r-bend consists of linear elements of dimension v — r, joined 
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at points which lie on linear elements of lesser dimension. Thus S is a poly¬ 
hedron Its faces we term complexes of dimension v, C,, and the linear ele¬ 
ments of its edges which lie wholly in bends of degree r, but not of degree r + 1 
are complexes C„_ r of dimension v — r. The boundary of any C a , a > 0 , 
consists of complexes of lesser dimension. The term complex is not restricted 
to either open oi closed complexes, 

Since the function v(u a ) of (2) is non-negative, it possesses a greatest lower 
bound (g.l.b.) g Since for some number h > g, there exists an N such that 
for all | u“ | > N, v(u a ) > h, it follows that for some closed neighborhood of E„ 
the g.l.b. of v is g. Since v is continuous everywhere it attains its g.l b., and 
so S has minimum points. Since the minimum of any complex not parallel 
to E„ , lies on its boundary, and the boundary consists of complexes, it follows 
that the minimum points of »S consist of Co’s and/oi' entire complexes of dimen¬ 
sion > 0 which are parallel to E v , The next section will show that S has a 
unique minimum complex (including of course its boundary complexes) and 
furthermore is cup-shaped 


V 



4. Convexity Property; Uniqueness of the Minimum. Consider v = 1 in 
the preceding treatment (and for convenience not written) S looks generally 
like Fig. 1. The slope changes only where an equation of (4) has a root. Sup¬ 
pose the point is «o, and x'lto — y 1 = 0. From (3), since v l > 0, it follows 
that eV < 0 for u < Wo, eV > 0 for it > no. Since in ( 2 ) x = S.eV, and 
since for h sufficiently small and ito — h < u < + h the only e to change 

value 2 is e l , we have that 

x(ui) + 2 | eV | = x(ui) 

where 

ito — h<ui<ua<ii2<Uo + h. 

Hence the slope is a monotonic increasing step function. Since for u suffi¬ 
ciently small all cx < 0 , and for u sufficiently large all ex > 0 , at some inter¬ 
mediate point or points either the slope is zero or it changes from negative to 

2 The e’s corresponding to equations proportional to equation (1) also change value at Xo , 
This does not destroy the argument 
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positive without becoming zero. In the first case a single closed C x is the 
minimum complex; m the second, a Co. In either case the curve given by (2) 
when v = 1 is concave upward and has just one minimum complex, except for 
complexes of lesser dimension constituting the boundary of this complex. An 
obvious consequence is 

Lemma I. The set of points u for which v is less than some number N form a 
convex point set. 

This result is easily extended to the general dimension v. If for any two 
points ui , ui of E ,, v(ui) < N and v(uf) < N, the plane in E v+X given by u a = 
u\ + X(w“ — Ui) makes a one-dimensional section of S. By Observation II, 
the points u lying on the projection of this section on E, have the property of 
Lemma I and of course lie on the straight line joining u x and u 2 This is the 
property required for a convex point set. Hence 

Theorem I The set of points u a of E„ for which v(u a ) as given by (2) is less 
than a fixed quantity form a convex point set 

From this it follows immediately that there is a unique minimum complex 
It is appropriate here to point out that no two complexes can be contained in a 
single plane of the same dimension This follows from the equation giving 
monotomeity of slopo in one dimension, and Observation II. 

5. Gradient Directions. From here on the treatment will be of v as a function 
defined on E, , and the equations will represent objects in E v , unless otherwise 
stated. Complex and Bend also will refer to the projections on E, of the com¬ 
plexes and bends of S For a single-valued function defined on E, the gradient 
at a point is the projection of a normal to the surface representing the function 
in E r+ 1 . If the function is defined only over a subspace of E, possessing deriva¬ 
tives, the gradient will be required also to be tangent to the subspacc. This is 
sufficient to determine a unique direction, and preserves the property that for an 
infinitesimal displacement in any direction the value of the function decreases 
most rapidly in the direction of the gradient. Here gradient is taken negative 
to its usual sense 

A point u lying on a C r but not on a C r _i will have a gradient in C r and also 
in each higher-dimensional complex on whose boundary C T lies. If the gradient 
for u as a point of C r +i points into C r+ k (remembering that u lies on the boundary) 
this will be called a usable gradient In the case of the greatest k for which 
there exists a usable gradient, there exists but one C r+L providing such a gradient, 
and that gradient is the “best” gradient; that is, of all directions in E, it pro¬ 
vides the direction of most rapid decrease of the function v. This follows from 
Theorem I Furthermore, all complexes of lesser dimension providing usable 
gradients lie on the boundary of this C r+k ■ In fact 

Theorem II, If for a point u on C r , two complexes C, and C' e , s > r, lying 
in different bends of degree v — s but incident at C r , both provide usable gradients 
for u, then the complex C,+ x on whose boundary lie both C, and C[ also provides a 
usable gradient for u. 
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This follows from Theorem I Select Ui on the gradient in C„, u 2 on the 
gradient in C B , for which v(u{) = v(u 2 ) The join of u L and u 2 lies in C s+1 , 
and for some point, u 3 on this join, v(u 3 ) is less than v(ui) = v(u 3 ). Also, the 
distance uu 3 is less than at least one of wl \, uu 2 , Hence C s +1 must contain a 
usable gradient. 


6. Selection of Best Gradient at Bends. The direction of the gradient for a 
point Mo considered as lying on a C„ is given by 

(5) q ■ Xuiuo) = (uo)%a- 


If Mo lies in the interior of a face, this is unique If u 0 lies m a bend, so that 
some e are not determined, the g a for each face is found by selecting the indeter¬ 
minate e’s as +1 or — 1 , according to the face being considered 
For a point «o considered as lying on a bend of degree r, given by r inde¬ 
pendent equations of (4): 

( 6 ) x\u a ~y X = o, (X = 1, ■. , r), 

the gradient for a particular C„_ r , determined by the conditions at the begin¬ 
ning of section 5, is 

(7) g“ = x\k\ - x a 


where h satisfies 

SaZ'Ua'fex = S a a£z„, (ju = 1 , • • • , r) 


and x a is as given in ( 2 ), the choice of sign for the indeterminate e x 
(X = 1, ..., r) being immaterial. They may, in fact, be taken as 0 m this 
instance. 

For a point uo lying on an ?-bend given by ( 6 ), to determine which complex 
contains the best gradient, each (r — l)-bend incident on the r-bend at Uo is 
tested for a usable gradient. Theorem II then determines the complex con¬ 
taining the best gradient. 

There are 2r such complexes incident at uo, given by the r sets of equations 
selected from (6): 


( 8 ) 


(X): x° a u a — if — 0 


(c = 1, • • , X — 1, X + 1, 
(X = 1, • , r). 


,r) 


The two complexes lying in the same (r — l)-bend have the same equations in 
( 8 ), but are distinguished later by e*(wo) for the omitted equation being taken 
first + 1 , then — 1 . 

The gradient for the Xth pair' of complexes is 


Q}> — X a kff Xa 

similar to (7), but not identical. Fqr e x = +1 in determining x a , we have 
g\+, and for e x = — 1 , p“_ . We restrict the consideration to e' = +1. 
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The line m the direction of greatest slope is then 

u = Uo + g\+t. 

Now «o is here considered lying on the complex given by (SX) with e = + 1 , 
In order that g“+ point into this face, the deviation for the Xth observation 
must exceed 0 when t > 0 ; otherwise, for a displacement in the direction of ffx+, 
e x changes sign immediately and the course is in the other complex. This 
deviation is 

v x = :CaW* — y 1 ' = x\uS — y 1 + x\ g“ + t = x\g*+L 

Had Q\- been used, this deviation must be less than 0. Hence a necessary and 
sufficient condition that a complex given by ( 8 ) with either choice of e l possess 
a usable gradient is 

(9) $X = - haZ’aXa] > 0 

For r = 1 the condition is given by (9) with the first sum merely omitted. 
$x+ and <hx_ cannot both exceed 0 . 

When all sets of equations ( 8 X) are tested by (9) the equations common to 
all sets possessing a usable gradient determine the complex with the best 
gradient, retaining the values of e for which (9) was satisfied. 

7. Property of the Minimum Point. For a minimum point, given by ( 6 ) 
with r = v, all $>x must be negative. Define X^ y = X a xixl and X po = 2 a XaX* 
for convenience. Then in (9), the numbers k„, —l are seen from their defini¬ 
tion in (7) to be proportional to the cofactors of the Xth row of the matrix 
(X 1 ", X' 1 ’ 1 '), p. having the same range as X. Thus i , x+ = c Det (X^, X+), and 
<$x- = — c Det (X M , X-), where m the first case X '' 0 is determined with e x = +1, 
m the second with e = —1. The factor of proportionality, c, must be the 
same since X'" is unaffected by change of e. Now let X 11 = 2 a XaX n where 
x* = hijtxa , the range of k omittmg the range of X. Then 

$>x+ = c [Det (X"', XO + Det (X"", X 1 *)] 

and 

$x_ = -c [Det {XT, X") - Det (X"', X*)]. 

Hence 

$x+$x- = -c 2 ([Det (X" f , X")] 8 - [Det (X M , X'*)] 2 ). 

Now let A represent the square matrix (x“), a giving the rows and X the columns. 
Let B\ represent the matrix formed from A by replacing the Xth column by x a . 
Then 


$x+$x_ = -c 2 [Det 2 {A’By) - Det 2 (A’A)} 
= -t? Det 2 A (Det 2 Bx - Det 1 ri.) 



MINIMISING SUM OP ABSOLUTE DEVIATIONS 


307 


and this will have the same sign as 

** = | Det (4) j - ] Det (B x ) |. 

Since 4>\+ and $x_ are never both positive, and at the minimum are both nega¬ 
tive for all A, at the minimum all Af\ > 0. To determine all 'Fx together, let, 
in matrix notation, s' = (si, ■ ■ , a») and x*' = (x* , ■ ■ ■ , x*) where xt were 
defined previously. Determine z as the solution of As = x*. Then | Det ( B \) | 
are equal to | Zx|| Det (A) |. Hence a necessary and sufficient condition that 
\I>x > 0 for all A is that all | Z\ j be less than one. Hence 
Theorem III : If a zero-complex is given by a set of equations whose matrix is M, 
a necessary and sufficient condition that the complex be a unique minimum is that 
the solutions of M'z = x* be all less than one m absolute value If k of the solu¬ 
tions are equal to one in absolute value , and the rest are less than one, the minimum 
is a complex of dimension k with the zero-complex as one of its corners. 

The last statement follows since if one solution is 1 in absolute value, a 
corresponding $x = 0, and hence no gradient, usable or not, exists. Thus the 
corresponding complex is parallel to E v . 

8 . Minimization for One Dimension. A method for minimization of (2) when 
there is just one parameter evolves from the monotonicity of slope in that case 
Suppose the variates are w' and z‘, and (1) is 

( 10 ) v = w't - z\ 

Suppose the variates are arranged in order of z'/w\ starting with the smallest. 
The slope of the rth segment (Fig. 1) from the left is 

Tj I w' | - £ | w' |. 

»—1 t—r+1 

The minimum occurs when the slope is 0 or changes from negative to positive; 
that is, when the first sum equals or exceeds the second; or when the first sum 
equals or exceeds half the total. This is a standard computation. If the 
change takes place when r = k , then i = z jw k is the value of t giving the 
minimum. 

9 Mimimization Procedure for v + 1 Dimensions. For any continuous func¬ 
tion with unique minimum and having the property of Theorem I, the following 
holds. Let Uo be any point of E, . Let u l+ 1 = u, + A,t,, where A, is any 
direction chosen at random and t, is the value of t for which the function attains 
a minimum on Lie curve u = u, -j- A,i. Then the probability is one that 
lira u, = ui , where iq is a minimum point for the function. If A, is taken 

always as the gradient of u, , such a procedure is called the “method of steepest 
descent” for approaching the minimum point. 

Usually the limit is never attained. In this case, however, the minimum is 
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attained. The minimum can be approached as closely as desired, hence a 
complex incident on the minimum is reached. But the convex point sets of 
Theorem I surrounding the minimum complex are all similar convex poly¬ 
hedrons in E v , whose corresponding faces are parallel, and the gradients at 
points on a bend cannot point into a higher dimensional complex on the bend. 
Hence the sequence of points lie on bends of successively greatei degree, and 
must eventually attain the minimum complex. 

TABLE II 

Points Uk 


zA± _ 

Mo = (38, 5, —2) 

Mi = (37.98202, -4 74828, -1.48457) 
u<i = (37.45908, -2.07142, -1.85631) 
m 3 = (32.83333, -2 07142, -1.76191) 


TABLE III 

Computation of t k = Zi/w L 


2 M 

in order 
of col. 

exceeds 

at i = 

hence t k =■ 

2 | w 0 1 


17521 

16 

.00599334 

2 | M>1 1 

(15) 

2502 

2 

.0397792 

2 j w 2 j 


4610 


.00496545 


TABLE IV 


Gradients g“ for column (5 k + 8) 


k 

gl 

g* 

3 

9k 

0 

-3 

42 

86 

1 

-13146 

67293 

-9345 

2 

-931588 

0 



The computational procedure is as follows: 

1 Select a point . 

2 . Determine the gradient g“ from (5). 

3 Compute u>o = , «o = y' — x' a u“ . 

4. Determine to by the method of section 8. 

5 Compute m“ = Mo + goto ■ 

6 Determine the complex containing the best gradient by (9), and the 
gradient g? by (7). 

and so proceed to the minimum This may be finally tested by Theorem III. 
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Step 5 is unnecessary, since the only use for ttf is to determine i(ui), But 
a\ui] = c\t 0 ), the latter referring to the computation in step 4, Also, after 
the first step, if is easier to compute 2 ' by 

4+1 = ~ > 

10. Example. The computation for (9) is not so great as it would seem, since 
some of the work is duplication and some must be computed anyway for the 
gradient. Even so, for r > 3 it becomes, perhaps, more arduous than its 
contribution would seem to justify, For v > 4 it is recommended that the 
test of (9) be omitted for points on bends of third degree or greater, and the 
final test of Theorem III be applied at the end of the work. If this test shows 
the minimum has not been reached, the complex in which lies the best gradient 
will be indicated at the same time. 

The minimum number of steps is 0. The maximum number is tremendous 
but finite. The expected number is probably a little greater than v. 

In Tables I to IV, the method is applied to the problem used by Rhodes to 
illustrate his method The independent variates are shown in columns (2), (3), 
(4), Table I, the dependent variate in column (5). The only other original 
datum is the initial point, selected by guess, shown in line 1, Table II Since 
slightly different formulas were used in the computation, the signs of cols 
(6), (8), (11), (16), (18) arc reversed, and the gradients m Table IV are 
multiplied by constants. As they are used only for directions, this does not 
matter. 

Princeton University, 

Princeton, N. J, 



A STUDY OF A UNIVERSE OF n FINITE POPULATIONS WITH 
APPLICATION TO MOMENT-FUNCTION ADJUSTMENTS 
FOR GROUPED DATA 

By Joseph A. Pierce 

The object of this paper is to study the case of a universe of n finite popula¬ 
tions, considering both the expectations of population moment-functions and 
the moments of sample moments, and to make applications of the results which 
may be of interest to mathematical statisticians. The sampling formulas which 
are derived reduce to the usual infinite or finite sampling formulas, under 
appropriate assumptions. Also a method is given whereby finite sampling 
formulas may be transformed into the corresponding infinite sampling formulas. 

The general methods and formulas which are given in Part I for the expecta¬ 
tions of population moment-functions are used, in Part II, to find the expecta¬ 
tions of moments of a distribution of discrete data grouped in “k groupings 
of k”. 


I. A Study of a Universe of n Finite Populations 

Let n Uif be a universe composed of the set of populations r X, (r = 1,2, , n) 

each population r X consisting of a finite number of discrete variates r z,, 
(i = 1, 2, • • , N), (N > n). The tth moment of T X is denoted by r /u. The 
fth central moment of t X is denoted by r jh . The fth moment and the £th central 
moment of n U N are respectively denoted by hi and fit . The expected value of a 
variable y is denoted by E{y) . We have 

1 w 1 ^ t 

r Ht = E(r%i) = T? 2 tX ' I T i* 1 = ~~ r/n) = Tf 23 (r X i ~ r^l) , 

iv rv >=i 

1 n 1 n 

fl ll = Xirfit) = ~ 23 rMt I Mlliit ~~ E(rfit) — ~~ 23 rfil j 

' > n r=l >* r"l 

_ jpf ,,'2 , •v'l 

■jp/ -BJ -82 »8t)\ 

We also note that Hv may be written /un ■■utri)/,*' ij*; • 

1. The expected value of moments and central moments. It follows easily 
from (1.1) that 

(1.2) Ml.;*! = Mi • 
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From the usual formula for central moments in terms of moments, we get 

(1.3) Ml fit = 2 ( _ 1 ) 

Terms of the form pa tHI ,,- t may be evaluated by use of the well known formulas 
[20; p, 58] for changing from moments to central moments in the case of a multi¬ 
variate distribution. Two of these formulas are given below. 

Mii:fi a )H ~ Mli:w» Mi0:/i o jibM0i.a.ia • 

PllVHaVtHc ~ Mill /laUbMo Mll0:/l o »‘b»‘oM001^ ll )Jb/'o 

(1.4) 

— ftioi.VaMiicPoio Mobile ~~ MoinnoMbfJcMioo.^^^ 
4" 2 Mioo.^ 0 nb»ioMoii).jiaiib(i 0 Mooi:(i 0 >ibji e • 

We find that 

(1-5) MUwi-i = 2 -|—j-|—i Mpin^iPf-,MiiJiM 1 ^!-. > 

where P 1 P 2 is a two-part partition of i and n + rs = 1 . 

Using (1.3) and (1.5), we get 

( 1 . 6 ) Mi:ii = M 2 — Maj*! • 

(1-7) Puf, = M3 ~ 3mu.iuhi + 6MlM2).i + 2 Wmi • 

(1.8) Ml'pb = Ml + 6(M2 - 2 mi) M2>! — 12miM3:mi + 12aiMll'(i,>j 2 

— 4mu:^i MJ + 6m 21:^ iM , — 3. 
etc. 

If the n populations are identical, it is evident from the definition of mi:^ 
that, for all finite t, 

Ml*/i< M* * 


2. The expected value of Thiele seminvariants. If the <th Thiele sem- 
invariant is denoted by \ t , then 

(-lr^Kp-Di 

U ; MUX, i Sl ! Ss !... S J(21)M(3|).. ... («!)*»*** 1,, ‘ 

the summation being taken for all positive integers s,(i = 1, 2, • • ■ v), for which 

V V 

P ~ ) t “ ^ ^3* • 

t-l t~l 

Terms of the form p aiSl . Mv are evaluated by (1,4). We have 


( 1 . 10 ) MI.X 2 = ^2 — P2.fi\ 
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(1.11) m 3 - h ~ + 6Xi^2m + 2/u.,,, . 

(1 12) Mix, = X 4 + 12[X 2 — 2Xx] /x 2 hi — 24Xijua (ll + 24Xi£n M1M2 
— 4Mu : ^ lMa + 12 m 2 i Mi(18 — 3/Z 2>1 — 6 / 14 .^ - 
etc. 

If the n populations are identical then, for all finite t, 

Mi x, = X ( . 


3. Generalized sampling. It follows from definition that all rational lsobaiic 
moment-functions have the property that they may be expressed in terms of 
power sums and power product sums with certain coefficients. Of the power 
sums and power product sums which enter a sampling formula only the power 
product sums take different forms depending on the law of variate selection 
Now, there are two possible courses which may be followed by one who wishes to 
derive sampling formulas for the case of a smgle population 

1 . One may decide in advance on the law which he wishes to govern the 
selection of variates which enter the sample. Then he may apply this law in 
the evaluation, in terms of moments, of every power product term as it occurs 
in each formula which is derived. 

2. One may derive the formulas for sampling under the condition that the 
law is unspecified, thereby obtaining formulas which are capable of being 
interpreted in terms of laws that are decided upon later 

Wc illustrate the two possible courses by considering the formula, 


(1.13) 


r _ 2 2r(r — 1) _. _ 

Ms Z - “ H 7 7T - ^%l%3 ; 

s s(s — 1) 


which Carver [12; p. 102] obtains for the case of finite sampling without replace¬ 
ments Here r = the number in the sample, s = the number in the parent 
population and s, = the algegraic sum of the variates of ith sample. Later, 
by evaluating "Lx and ~Lx x x, in terms of moments, he finds 


(1.14) 


M2:2 


r(s — r) 
s — 1 


fax- 


(It should be noted that Carver [12; p. 115] obtained the corresponding formula 
for infinite sampling by letting s —* <»). 

The preceding development is entirely m accord with the first of the courses 
stated above It is also the standard procedure and is the course followed by 
such writers as Isserles [2], Neyman [6], Church [7], Pepper [11] and Dwyer [20], 
in deriving finite sampling formulas. Also, it is the course followed by such 
authors as “Student” [1], Tchouproff [3], Church [5], Craig [9], Pisher [10], and 
Georgcsque [13] for the case of sampling from an infinite population 
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However, m (1.13), it is possible to employ the definition, 


2 x t x, = Ml.l ■ 


s(s - 1) 

Then (1.14) becomes 

(1.15) ga * = ^2 + r(r — 1 )mx,i . 

Formula (1.15) may be interpreted as holding for either finite or infinite 
sampling, depending on the interpretation which is given to mu . It may be 

easily shown that, if the sampling is from a limited supply, mu = -1 — £ 2 an( i 

<s — 1 

(1 15) reduces to (1.14). If the sampling is from an infinite supply, / 2 U becomes 
Mi and therefore 


M2.J 


rfit-.x 


which is the formula [12; p. 115] that concsponds, in the infinite case, to (1.14). 

Thus, either of the two courses is possible in the case of sampling from a single 
population. However, if one wishes to get general formulas which hold for both 
infinite and finite sampling, he should follow the second course. Similarly, in 
order to obtain generalized sampling formulas where the lelations between the 
variates arc unspecified and the populations are assumed to bo different, the 
second course should be followed. 

It appeals that Tchouproff [3], [4] was the first to approach the sampling 
problem from such a general point of view. However, his methods of derivation 
are quite complicated and his results, in general, are difficult to apply to a given 
problem [5], [8], 

Samples of n are formed from n U N by cliosing one variate from each of the n 
populations. A typical sample is 


, 2 X h , 3Xj 3 , • ' ' , i ' ' ' ) n%i„ ■ 

We define [4; p. 472] 


i. , 2 r,— E( Tl x x l r ri Xil ... Tv x t v ) 


(1.16) 


l 1 7^rjfe 


1*1 r 2 ’ i(j I 


JL y _ l <v __ 

n (ti) T ^ { T * Tl — n ( v ) Vv r t rt r„M(i!2"'<i> ~ 


where k represents the number of possible terms of the given form; S v means v 
times the sum for unequal values of r L , r 2 ■ ■ r v and n M = n(n — 1) • •• 
(n — v + 1). 


4. Moments and product moments of sample moments. The <th moment of 
the jth sample is denoted by . The sth moment of ,m t for all j is denoted by 
'm* m, where the prime indicates that the moments of the universe are measured 
about a fixed point. It follows that 
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(1.17) 


0 m t 


1 ™ 

= ~ £ t x i t and jx a mt - ■ 


n t=i 


Also, the general product moment, in which the variates of both the sample 
and the universe are measured about a fixed point, is defined by 


(1.18) 


ijWi5 i Bit,]* 


As an illustration of the methods used to derive the formulas of this section, 
consider a special case of (1.18) when si = 2 and s, = 0, (i = 2,3, • • • , v). Then 


= rM21 "h $2 rir 2 ai.( I ■ 

71 Lr=l _J 


Therefore, by (1.1), (1.2) and (1.16), we get 


(1.19) 


Vz.mi = [WM2( + n<2> Mf.il. 


Using the formulas [20; p. 34] relating products of power sums and power 
products to expand expressions of the type E(,mi[,m]] > 3 m]’), we give, m the 

tables below, formulas for moments and product moments of sample moments 
through weight six. The number in a cell and the coefficient, m the same 
column, at the top of the table should be taken as the coefficient of the moment 
which is found in the same vertical division. The coefficients in the vertical 
division are coefficients of the entire right members of the formulas for the 
respective moments. 

Terms of the form non if h = U - ■ ■ = t r = t, are sometimes written 

j^r + l t 

The numbers in the cells of the tables are identical with the numbers in the 
cells of the tables given by Dwyer [19; p. 30] for the expected value of partition 
products. 


fi. Moments of central moments of samples of n. The 1th central moment of 
the jth sample is denoted by ,mt Then, 

1 71 

j ?Tit = — (r^i r 

n r=I 


( 1 . 20 ) 

and 

( 1 . 21 ) 


V* mi — '-® ~ 2 (r#i r j^l) 1 ■ 

L.™ r"l J 
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TABLE I 


(1) 


( 2 ) 


( 3 ) 
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After writing ( r x tr — ,mi) ! as the sum of the general term of a binomial series 
and then expanding the resulting right member of (1.21) as a product of power 
sums [20; p. 19], we get 


( 1 . 22 ) 


. _ _ V 1 s - 






U t) 

where ]L r i = s > 2 ®i*i = p and > n > • ■ are the numbers of the repeated 

3=1 3—1 

parts of s. 

The mean of the tth central moment takes the following simple form, 


(1.23) 







where the moments in the right member of (1.23) through weight six are given 
in the tables of section four. Also, 


(1.24) 

(1.25) 

(1.26) 


Vsimj = Vs m 2 2 p.21 mimi "h M4.m, . 

'/18 m 2 = Vs ">2 6 PS2 mjtlij "1“ 3 P41.mim 2 MS mi > 

V2:m a = Vfcm, + 9 r M22 m,m 2 + d/pB.m, — 6VlU*mim 2 m 3 

*4" 4 /Igi'mimg 12 Mil.m.n>7 • 


After substituting from the tables of section four, (1.23) through (1.26) become 


(1.27) 

/U:m] 

„<2> 

^ta¬ 

il 2 

Ml.il- 







„(3) 






(1.28) 

Vl:mj 

= ^ Vs - 

7T 

3/12,1 + 2/*i s]. 







- !>»(»■ 

— 3ii + 3) (in - 

- 4/ij.i) 

+ 

3n (s) (2n - 

" 3)/l2,2 

(1.29) 

nP 












+ 3n (4) (2g! 


Ml:m* 

- i I»“V 

VP 

— 2n + 2)(ju6 - 

- 5/14.1) 

+ 

10n% - 

" 2)/3,2 


+ 10n (3> (n + l)(n — 4)p8,i 2 — 30n (3) (n — 2)^ 2 j,i 
— 10n (4) (3n — 4)g2,i> + 4n C6) jin]. 


(1.30) 
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Vim, = - 5ft 3 + 10n 2 - 10ft + 5)04 - fi s ,i) 

71 

+ 15ft (2 V - 4ft 2 + 7n - 5) M 4,2 - 10ft (a, (2ft 2 - 6n + 5W 
(1 31) + 15ft (3) (ft 3 - 4ft 2 + 6ft - 6Wi - 60ft (3 V - 4ft + 5) M3 , 2>1 

+ 15ft ta) (3ft - 5ta» - 20ft ( "V - 3ft 4- 5ta, 13 
4- 45n (4) (2ft - 5W 4 . + 15ft t6) (ft - 5)/i 2i1 * - 

(1.32) V«:*i = ^ [w <2> (ft — 1 )(a» 4 — 4^3,]) 4" n <3> (ft 4- 1 )m 2,2 — ft (4> (2jti2,i2 — /n*)]. 

Vi<i) = ^5 [ft <2) (ft - l) 2 (/i« ~ 6 m5,i) 4- 3ft (21 (ft — l)(ft 2 — 2ft 4 - 5)/j4 (2 

— 2ft ( ' ) (3ft 2 — 6w 4* 5)jua,a + ft® (ft 3 — 3ft z 4" 9ft — 15)jusi 

(1.33) - 3ft (aJ (ft - l)(ft - 5ta,n - 12ft t8, (n 2 - 4n 4- 6ta,v 
4- 4ft U) (3ft - 5ta,i* ~ 3ft (4) (ft 2 - 6ft + 15ta.,n 

4- ft <8) (3/12,1* — Ml 4 )]- 

Wfl, = i[*% - l) 2 (ft - 2)0*, - 6/15,1) - 3ft (2, (ft - 2) 2 (2n - 5 )/i 4 ,2 

71 

4- ft <2) (ft - 2) 2 (ft 2 - 2ft 4- iota,3 

(1.34) - 6ft™(n - 2)(ft 2 - 6ft 4- 20ta,,.i + 3n (3, (ft - 2)(7n - 10ta,i. 
4- 3n (3) (3ft 2 - 12n + 20ta s 4- 4n (4, (ft - 2)(n - 10ta, I5 

4- 9n (4) (n 2 — 8ra 4- 20taz,i2 — 4ft (8) (3/12,1* — /n«)]. 


6. The variance of the variance of samples of ft. The variance of the variance 
of samples of ft, when the moments of the universe are measured about a fixed 
point, is defined as 

(1.35) ta:m 2 = ta.m 2 — [VllmJ*- 

Therefore, from (1.27) and (1.32), 


(1.36) 


Wm, — -j [ft <2> (ft — l)(/l* — 4/l»,i) d - ft® (ft — l)/*2,2 — ft (4) (2/l 3ll 2 — /l!*)] 



0*. - Mi,i) 2 - 


Tchouproff [4; p. 492] gave a formula (8) for the variance of the sample 
variance but his result is unwieldy due to the fact that moments of the universe 
are measured about the mean. 
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7. Conventional infinite sampling formulas derived from generalized sampling 

formulas. The term "infinite sampling" is to be interpreted as meaning- 
sampling from an unlimited supply or sampling from a limited supply with repeti¬ 
tions permitted. In each of these situations the variates are independent [5; p 79], 
First, it is assured that the n populations are identical, that is, iX = = 

= „X. This assumption results in the fact that, for a fixed t, ui t = mt = • • ■ = 
„Mt and i= a j5 e = • • = n pt . Therefore, under the assumption of identical 

populations, every moment may be interpreted as either the moment of n identi¬ 
cal populations or as the moment of a single population. The only other as¬ 
sumption is that the sampling is “infinite". 

From the condition of independence [3; p. 141], we have 


r,<) = (E ^(E r^) ... 


Therefore, 


rir 2 — riMU i^MU • • r„Mt„ ■ 


Combining the condition of independence with that of identical populations, we 
have 

(1.37) r l r 2‘ ftlMllU **U ^ V l^MU ' * ' UrMU Mu Mu ' * * Mu 


By (1.16) and (1.37), we may write 

(1.38) Mpu *l v Mi,Mu ’ ‘ Mu ' 

Since the only terms of the generalized sampling formulas ' , e affected 

by the assumption of “infinite sampling” are those of the form p tlh . .i„, the 
problem of obtaining conventional infinite sampling formulas from generalized 
sampling formulas is, in practice, a mechanical one. Simply write terms of the 
form m,!, .i„ which appear in a generalized sampling formula, as muMu • ■ Mu 
and one automatically obtains the corresponding infinite sampling formula. 

As an illustration of the method, consider the generalized sampling formula 
(1.36) for the variance of the sample variance. When (1.38) is utilized to change 
it into the corresponding infinite sampling formula, (1.36) becomes 
(2) 

(1.39) 'fc-.ihi — —u [(« — 1 )(mi — 4m3 Mi) — (n — 3)pl + 2(2n — 3)(2m2Mi "* Mi)]. 

which is the usual formula [20; p. 75] for the variance of the sample variance 
when the moments of the universe are measured about a fixed point. If it is 
assumed that the moments of n U n arc measured about the mean, formula (1.39) 
becomes 

( 2 ) 

(1.40) jirmt = [(n - 1)M4 - (n - 3)m 2 ], 


which was published by “Student" [1; p. 3] in 1908. 
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8. Conventional finite sampling formulas derived from generalized sampling 
formulas. The term “finite sampling” is to be interpreted as meaning: sampling 
from a limited supply when repetitions are not permitted. 

In order to reduce generalized sampling formulas to the corresponding formulas 
for finite sampling, the assumptions are made that the n populations are identical 
and that N and n are finite, N > n. The selection of variates which enter each 
sample is restricted in the following manner. If a variate having a given post¬ 
subscript is chosen, then no other variate having the same post-subscript may be 
chosen for the same sample. 

Now it is evident that terms of the form p htl . t , must be redefined on the 
basis of the preceding assumptions From the expansions [20; p. 32] of power 
product sums m terms of products of power sums, we get the formulas for p, lh , , 
which are given in the following tables. 

The formulas in the tables of this section are called transformation formulas for 
finite sampling or more briefly transformation formulas. , 

The transformation of generalized sampling formulas into corresponding 

tit 2 2 _ jy 

finite sampling formulas is illustrated by the substitution of ^ for ftil 

in (1.27). We get 

(1.41) Vi, = ^^5 Im - A 

which is the well-known finite sampling formula for the mean of the variance of 
samples of n. 

From this and the preceding section it is evident that the generalized sampling 
formulas may be considered as formulas for either infinite or finite sampling 
depending upon the interpretation given to terms of the form pt, i 3 ■ t v 

9. Transformation of infinite sampling formulas into corresponding finite 
sampling formulas. It is a well-known fact that infinite sampling formulas may 
be obtained from those for finite sampling by letting the size of the parent popula¬ 
tion become infinite. But, prior to this paper, apparently no one has presented a 
method of obtaining finite sampling formulas from infinite sampling formulas 
However, by making use of the relations between finite, infinite, and generalized 
sampling, we shall demonstrate that it is possible to transform any infinite 
sampling formula into the corresponding finite sampling formula. 

Since the infinite sampling formulas are obtained from the generalized sam¬ 
pling formulas by replacing 

Phh e, by • • • Pt v 

it follows that generalized sampling formulas may be obtained from the infinite 
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TABLE II 
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formulas by replacing 

(1,42) ' v-tu by Uu 

However, it must be emphasized that the application of (1.42) demands formulae 
which are expressed in terms of moments of sample moments rather than central 
moments of sample moments (although the sample moments may be measured 
about a fixed point or about (he mean) and the moments of the universe must be 
measured about a fixed point The reason for these restrictions is to insure that 
each term is accounted for individually 

After replacements (1.42) are made m the formula for sampling from an 
infinite population, the resulting formula is the corresponding generalized one. 
The step to the corresponding finite sampling formulas is simply the one outlined 
in section eight, namely, the use of the transformation formulas. 

We shall consider, as the first illustration, the infinite sampling formula for 
the mean of the sample variance when the moments of the parent population are 
measured about the mean. The formula is 


(1.43) 


n — 1 _ 

Ml «, = —■- M2 ■ 


When (1.43) is expressed m terms of moments of the parent population about a 
fixed point, we have 

(1 44) W, = bn - w] 

Following (1 42), n\ is replaced by and (1.44) becomes (1 27). The use of 
the transformation formula for gu gives (1.41) which, when the moments of the 
parent population are measured about the mean, becomes 

m .tK'i _ N(n 1) _ 

d. 5) w* w (W-l) M2- 

Infinite sampling formulas expressed in terms of moment-function, may be 
similarly tiansformed into the corresponding finite sampling formulas. For 
example, Craig [9, p. 57] gives the second Thiele seminvariant of the variance 
of samples as 

(1.46) X 2 „, 2 = h + 2 X 2 2 

rv n l 


First, wc express (1.46) in terms of moments about a fixed point by use of the 
formulas relating Thiele seminvariants and moments [9; p. 12]. We also recall 
that the resulting formula should be expressed in terms of moments of sample 
moments rather than in terms of central moments of sample moments We 
obtain 


(1.47) 


M 2 — 


(n - 1) 


[(n — l)g 4 — 4 (ft — l)gagi + ( n 2 — 2 n + 3)g 2 


- 2(n — 2)(n — 3)g 2 /n + (n — 2 )(n - 3)/n]. 
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The next step is to transform (1.47) into the corresponding generalized sampling 
formula by use of (1.42) We obtain (1.32). Since we desire to obtain the 
finite sampling formula which exactly corresponds to (1 46), it is necessary to 
transform (1.32) from the second moment of rhi to the variance of m 8 and we get 
(1.36). Next the transformation formulas are applied to (1.36) When the mo¬ 
ments of the parent population are measured about the mean and are replaced 
by Thiele semmvariants, (1.36) becomes 

(1.48) Umi n3 (N - DKN - 2) (IV - 3) 1 ^ Nn N 71 

+ 2(N s ?i - 3 Nn - 3N + 3n + 3)\1]. 

Formula (1.48) gives the second Thiele seminvanant of the variance of samples of 
n drawn from a finite parent population of N When N -> », in (1.48), we 
obtain immediately (1.46). 

It is generally true that infinite sampling formulas are more easily derived than 
arc the correspondmg finite sampling formulas. The methods of this section 
make it possible to derive the desired sampling formulas for the infinite parent 
population and then transform these infinite sampling formulas into the corre¬ 
sponding finite sampling formulas. 


II. Moment Function Adjustments for Grouped Data 


A given distribution of discrete variates may be grouped m “k groupings of k”. 
We desire to find the correction which eliminates the error made in replacing a 
given moment of the original distribution by the average of the corresponding 
moments of the k grouped-distributions. 

Formulas for the adjustments for moments of a grouped-distribution of 
discrete variates were first given (without proof) in the Editorial of Yol. I, No. 1 
of the Annals of Mathematical Statistics. Later, more satisfactory derivations 
of adjustment formulas were given by Abernethy [24] Craig [25] and Carver [26]. 
However, it was observed by Carver [26; p. 162] that the developments of 
Abernethy and Craig are adjustments about a fixed point and that they fail to 
hold for the case of expectations of central moments if we accept the definition 


_ 1 -y . 

Ml.ii “ T Zj rMi i 


(f = 2,3, •••)• 


Here T JU represents the fth central moment of the rth grouped-distribution. The 
formula for the true value of was supplied by Carver [26; p, 162] but he did 
not indicate a general method which might be used for the derivation of > 
(<> 2 ). 

A distribution of discrete variates grouped in “k groupings of k” is a special 
case of a universe of n finite populations and hence the methods and formulas 
for the expectations of population moments are applicable to our present 
problem. 
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It is found that the adjustment formulas for moment-functions of grouped 
data involve central moments of a rectangular distribution, It will be con¬ 
venient for our present purposes to give a brief treatment of the moment-func¬ 
tions of a rectangular distribution. 

1. Moment-functions of a rectangular distribution. Consider the rectangular 
distribution of discrete variates, 

(2 l) h, 2 h, 3 h, ■ ■ , kh. 

It is readily shown that the moment generating function of (2.1), 

d 2 0 ” 

(2.2) (r,(0) = go + M+ w 2 | + • * * + V* ~j + 


may be written 
(2.3) 


GM = 


e Hk+me skb i khff 

k sinh 


Setting the expansion of the right member of (2 3) equal to the right member of 
(2.2) and equating coefficients of like powers of 0, we obtain the following recur¬ 
sion formula for the moments of (1.1) 


(2.4) 




+ (- 1 ) 


r—1 (n + l) W Lf-1 


r! 


— h r 1 g„_r+1:« + • • > — ft h , 


where g n r represents the nth moment of a rectangular distribution. Formulas 
for fi wR , (n = 0,1, ■ ■ • , 10) are given below. See Sasuly [27; p. 27]. 


go s — 1- 

gi, r = §(ft + 1 )h. 

M 2 r = i(ft + l)(2fc + l)h 2 = K2 k + l)hm.K ■ 

Mj.r = i(ft "h 1) kh = kh mi.r • 

Mi:R = H3& 2 + 3 k- l)h 2 M2.R . 

(2-5) jj 6 , b = ^(2fc“ + 2fc + 1 )ft 2 m3:r ■ 

Ms.r = !(3/c 4 + 6ft 3 — 3fc + 1 )h* ni-.n . 

Mj.r = m l + 6/c 3 - fc 2 - 4/c + 2)/( 4 M3:r ■ 

W . B = x V(5ft 6 + 15ft 6 + 5 ft 4 - 15ft 3 - ft 2 + 9ft - 3)h 6 g** • 

M6iR = i(2fc 3 + 6fc 6 + /c 4 - 8fc 3 + k 2 + Qk - 3)Ji 8 g 8 r • 

Mio.» = T V(3fc 8 + 12ft 7 + 8/c e - 18ft 5 - 10ft 4 + 24fc 3 + 2ft 5 - 15fc + 5)ft 8 M:.r ■ 
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The deviations about the mean of (2,1) are 


(2 6) -i(fc - l)fc, -i(i - 3)A, • , h(k- 3)fc, 

Therefore, 

(2.7) M2n+1 H = 0. 

•If we denote (2.6) by a, we have 


Kfc - 1)A. 


( 2 . 8 ) 


^ fa\ - sinh 

W - Asinh \(M) 


The recursion formula for central moments of (2.1) is 


(2.9) 


(2n + 1) 
1! 


a) 


M 2 n R I 22 


h* (2n + l) (a) . 


3! 


P-iv -2 fl + • • • 

V (2n + 1) (,+1) 
' 2 r (r + 1)1 


Pln-rR + 


fc 2 ”/l 2 ’ 

2 2 " 


, 5) are given below. See [27; p. 27], 


( 2 . 10 ) 


Foimulas for /i 2 « k , (n = 0, 1, 
ik-R = 1, 

w B = Tj(fc — l)^ 1 1 
/n k — uV^/c 2 — 7)h 2 jii.it, 
w-k = TTu(3fc 4 - 18fc 2 + 31)ft 4 M2.B, 

JZi> = Tshs(5k a - 55fc 4 + 239fc 2 - 381)A'fc B| 

Mio'B = WTr(3fc 8 - 52fc 6 + 410fc 4 - 1636fc 2 + 2555)/iV«. 

From the relation which connects Thiele semmvariants and the moment 
generating function, we get, see [25; p. 57], 

fc.-O, x,„_<*+A\ 

( 2 . 11 ) 


X2n+UB — 0, 
- 1) 


, lV .+l B n h 2n (k 2n 0 

Xan.B = (- 1)- 2 ^- ' n 2 > 3 ’ 

where X„ R represents the nth Thiele seminvariant of a rectangular distribution 
of discrete variates and B n , (n = 1, 2, • ■ ■), the Bernoulli numbeis. uV> ■ • ■ 
In each of the cases considered in this section, corresponding formulas maybe 
found for a rectangular distribution of continuous variates by setting h = m/k 
(which makes the range m with k subdivisions) and then letting k —> ® 

2. Adjustments for moments. As our basic distribution we consider the set of 
discrete variates, x, , (i = 1 , 2, • ■ , IV), where some of the x;’s may not be 
distinct. We assume that the given distribution is grouped in “k groupings 
of k". 
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When Xi is placed in the ? th position of a class, the limits of the class are 
x t - (?• — 1 )h and z, + (k — r)h and the class mark is x, + - j h 

Thus, when the class mark is used as the value of x,, the quantity 1 J /j 

is added to the true value of x t . Therefore, when the expected value of a 
particular moment for “k groupings of k" is found, each variate has made a 
definite contribution as it was placed in each of the k positions of a class. 

For convenience, we define 


( 2 . 12 ) 




k — (2r — 1) 


h, 


(r = 1.2, 


As was previously indicated, the expected value of a given moment involves 
the contribution of each variate as it occupies the k class positions. A con¬ 
venient method of finding these contributions is by means of a universe kU n 
which is composed of the populations r X, (? = 1, 2, - , k). The rth population 
consists of the values of the variates when they occupy the rth position of the 
class. Hence r J consists of r x t = x, + e r , (z = 1, 2, • , N). 

The notation for moments is the same as that of Part I. Since is of the 
same form as the universe studied in Part I, we use the definitions (1.1) of that 
part. 

The expected value of the <th moment is 

Ml Mi s r'I + Cr)' 

fc r-1 



Many devices have been used by previous wiiters [24; p. 269], [25; p. 57], 

1 k • 

[26; p. 157], to evaluate terms of the form r 52 However, it should be 

fC r=*l 


noticed that the quantities e, , (i = 1, 2, . ■ , k ), are respectively identical 
with the deviations (2.6) about the mean of a rectangular distribution of discrete 
variates. It follows that 



fC r«* 1 


And since fi ia +i r = 0, we have 

(2-13) Min t ~ 52 (^2 g yM<-2»M2«-R< 

Formulas for ji ia B , (s = 0,1, < ■ ■ , 5) are given by (2.10) 

If the class marks are selected as the unit of x, we set h = 1 in (2.10). If the 
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class interval is chosen as the unit of x, rve set h = l/k in (2.10). If k con¬ 
secutive values of the discrete variable are grouped in a frequency class of width 
to, we put h = m/k in (2.10). 

Usually we desiie to estimate the value of the moments that would have been 
obtained if we had not grouped the data. Therefore (2 13) is solved for the 
moments of the ungrouped data. We have 

[t/21 / , \ 

(2.14) /n = £ ^2 

wherein 

p _ v (- D p (2s) i pi «mS, £ ■ • • 

2S L(2p,)!r , [(2p 2 )!]^ ■ • ■ [(2*01]'^! 7 T 2 ! • ■ - t.I 1 

the summation being taken for every possible product of moments for which 

U y 

pi == Sj !r» = P* 

i»>i 1=1 

Formulas, corresponding to (2.13) and (2.14), for a distribution of continuous 
variates are written by replacing the moment symbols for discrete variates by 
those for continuous variates 

3. Adjustments for central moments. Consider the universe U which consists 
of the population T X, (r = 1, 2, • • • , k), where r X is the rth grouped-distribution. 

The expected value of the £th central moment of the k grouped-distribution is 
given by (1.3), (1.4) and (1.5) of Part I, where now ml^-i is given by (2.13) of 
the preceding section. Thus, the development of this section is identical with 
that of section one of Part I with the single exception that mi-„, = m< no longer 
holds but is replaced by gi N = m* + a correction. Therefore, the formulas for 
the adjustments for central moments may be obtained immediately from the 
formulas derived in section one, Part I, if the corrections of the preceding section 
are inserted. We have 

(2 15) mi ft = M2 + M2-fl — P 

(2.1G) Mi ft = M + 6 miM2 h ~ 3/t U!w#1I + 2fi 3 

(2 17) M= M4 + 6m2M2,b "I" MCR + 6(m2 — 2M* + M2 r)mo 

+ 12mimu’i*im2 — 12miM3 m — 4mu >iKj 

4" b/121 3M4. M 

The moments of the ungrquped data can be obtained readily from formulas 
(2 15) through (2.17). 

Adjustment formulas for central moments of a distribution of continuous 
variates may be obtained from (2,13) by replacing the moment symbols for 
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discrete variates by those for continuous variates and taking the moments about 
the mean Also, it may be observed that adjustment formulas for central 
moments of a distribution of continuous variates may be obtained from formulas 
(1.3), (1.4) and (1 5) of Part I, provided the moment symbols are exchanged as 
indicated above and terms of the form n v are se t equal to zero 

4. Usual adjustments for Thiele seminvariants. The usual adjustments for 
Thiele seminvariants, for the univariate discrete population, may be developed 
directly by use of one of the fundamental properties of Thiele seminvariants, 

It is assumed (see [25; p. 55]) that k consecutive values of the discrete variable 
are grouped in a frequency class of width m The k smaller intervals of width 
m/k = h go to make up the class width m, the actual points representing the k 
values of the variable being plotted at the centers of the sub-intervals Now, 
let us suppose that each of the k consecutive boundary points of the subintervals 
is as likely to be chosen as a boundary point of the larger intervals as any other 
Then, if x t is the class mark of the fth frequency class, for any true value, x, of 
the discrete variable included in this frequency class, we have 


X x — x -j - c? 


in which x and e r are independent variables and e r takes on the k values (2 12) 
with equal relative frequencies 1/fc. 

Since we have noted that the equally likely values which e r may take on are 
deviations about the mean of a rectangular distribution of discrete variates, we 
employ the cumulative property of Thiele seminvariants [9; p. 4] and obtain 
directly 

(2.18) XL = X,* +X,.,, (t = 1, 2, ), 

where Xc* is the tth semmvariant computed from the grouped data, \f X is the 
<th seminvariant computed from the ungrouped data and X, « is defined by (2.11). 

Formulas corresponding to (2,18), for special values of l, are given by Craig 
[25; p 57], However, the present development indicates the dependence of 
adjustment formulas on central moments of a rectangular distribution and pro¬ 
vides a general formula for these adjustments which is expi eased completclvin 
terms of Thiele seminvariants. 


5. New adjustments for Thiele seminvariants. If we accept the definition 



(t = 2, 3, ...), 


then (2.18) is at best only an approximation formula. We now desire exact 
formulas for m for the case of a grouped-distribution of discrete variates 
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First (1.9) is used and terms of the form ^ V(I1(1 ,_ Mv are evaluated in terms 
of central moments by (1.3). Then terms of the form /i 1)1( are evaluated by 
(2.13) and finally the relations between moments and Thiele seminvariants are 
employed Exact, formulas for the expected values of the second, third, and 
fourth Thiele seminvariants for grouped-distributions of discrete variables are 


given 

below. 





(2 19 ) 

Ml )\2 

= 

A2 -)- Aj k 

Vs pi • 


(2.20) 

Ml X 3 

= 

A3 + GAi/b',., — 3 j 3 ii 

hire "i~ ^Vs it 1 • 

(2.21) 

Ml X« 

= 

Al + A4.fi 

+ 12[\2 - 

■ 2Af -j- X 2 R ]fi- 



+ 


— Vs ii,]Ai 

~ 4 gu (, 1W 



+ 

12/121 

~ 6/hi,,, - 

3/12 „ 2 . 


Formulas for Thiele seminvariants of ungrouped data in terms of expectations 
may be obtained from (2.19) through (2.21). 

Adjustment formulas for Thiele seminvariants of a distribution of continuous 
variates are given by Langdon and Ore [23; p. 231] and Craig [25, p 57]. If we 
denote the fth Thiele seminvariant of a distribution of continuous variates by 
L t , then 

(2.22) Pi L t — L t + L t K , 
where 

(2.23) Lu+\r ~ 0, Lu.r = -—-— j t = 1, 2, • ■ • . 

Formulas (2.19) thiough (2 21) may be used for continuous variates by 
changing the moment symbols and setting terms of the form m 

equal to zero 

6. Adjustment formulas applied to a numerical problem. We consider the 
arbitrary distribution given m Table III. 


TABLE HI 

An Arbitrary Distribution of Discrete Variates 


V 

f 

V 

f 

V 

f 

F 

1 

Si 

mm 

30 

7 

1 

2 + 30 + 1 = 33 

2 

M 

gj§|| 

4 

s 

1 

8 + 4 + 1 = 13 

3 

10 

mm 

3 

9 

1 

10 + 3 + 1 = 14 
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The three grouped distributions, when the variates are grouped in "groupings 
of three," appeal in Table IV. 

TABLE IV 


Distributions Derived from Data of Table III by Making the Three Possible Groupings of Three 


(1) 

(2) 

(3) 


Clasa 

f 


/ 

Class 

/ 

1-3 

20 

SB 

10 

-1 to 1 

2 

4-6 

37 

3-5 

44 

2-4 

48 

7-9 

3 

6-8 

5 

5-7 

8 

10-12 

0 

9-11 

1 

8-10 

2 


Using the fixed point 4, moment-functions are computed for the distribution of 
Table III and for each of the distributions of Table IV. These quantities 
along with the average of each moment function appear in Table V. 

TABLE V 


Moment-Functions of the Distributions of Table III and Table IV Averages of Moment- 
Functions of Distributions of Table IV 


Dist 





hi ™ bj 


hi 

hi 

U) 

9 


1 69 

1125 

9819 

-17442 

238,849,317 

-50,388,966 

60 

60 

1 60 

1 


(60)* 

(60) 3 


(60) 4 

(2) 


.171 



10179 

567162 

'557,840,277 

247,004,154 

60 

60 

60 

60 



(60) 4 

(60) 4 

(3) 

-30 

162 

138 

1938 

8820 

1317600 

528,282,000 

294,904,800 

60 

60 





(60) 4 

(60) 4 

Ave. 

-10 

166 

96 

1858 

9606 

622440 

441,657,198 

163,839,996 

60 


||B 

60 

(60) J 




Orig 

-10 

126 

116 






Dist 

60 

60 

60 

60 

(60)* 

(60) 1 

(60) 4 

(60) 4 


Table VI gives the expected values of the moment-functions as obtained by 
substituting from Table V into the formulas of sections two, three, and five. 
Also the expected values, computed from the usual formulas, are given and the 
errors which would be made, if the usual formulas were used, are indicated. 
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TABLE VI 


Expected, Valves of Moment-Functions Computed by Formulas 


E\pectations by 



^I'Ma 

"l-M 

^1:^2 — 
**1X 2 

— 

JHX 3 


Ml.),, 

New Formulas 

-10 

166 

96 

1868 

9606 

622440 

441,657,198 

163,839,996 

60 

60 



(60) ! 



(60)< 

Usual Formulas 

-10 

166 

96 

1858 

9860 

642400 

416,778,000 

133,795,200 

60 

60 





(60)i 

(60) 4 

Eiror 





264 


-24,879,198 






(60) 5 



(60) 4 


7. Evaluation of jh, H . It appears at first that it is necessary to form the 
“k groupings of k” in order to evaluate the term which enters the precise 
formula for the expected value of the variance. That was the procedure fol¬ 
lowed by Carver [26; p. 161]. However, it is possible to evaluate from the 
ungrouped data without forming a single grouped-distribution. 

By definition, 

A2.n1 = r E trMi “ Mil 2 ) 

/C r“I 

where ,mi is the mean of the rth grouped-distribution and mi is the mean of the 
ungrouped distribution. We wish to study the terms r Mi and m . Consider a 
set of variates x t , (i = 1, 2, • • • , s), with corresponding frequencies/,', (t = 1, 2, 

■ ■ • , s). The x’s are subject to the condition, x, — x,_i = 1, and consequently 

Sxf 

some of the f’s may be zero. The mean of this distribution is ~ . 

A/ 

We define 

F t = /, + fh+i + fm+t + ’ ■ j ( l = 1) 2, ■ , k) 

Then, if a grouped-distribution is formed with x, in the ith (i = 1,2, ■ • ,k) 
position of a class, the mean of this grouped-distribution is 

_ 1 

+ E 

J »1 

E7 

where e,_i = e,. if e, = 1 and e )+ i = d if e t = e K . Similarly if a grouped-distribu¬ 
tion is formed with x, in the (i -)- l)st position of a class, the mean is 

k 

Hxf + E F, e t+1 

i -1 

E/ 
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Thus, it is evident that, given the expression for the mean of any giouped- 
distnbution in which x l is in the ith position of a class, we may form the expres¬ 
sion for the mean of the grouped-distribution in which x t is m the (i + l) s t 
position of a class by a cyclic permutation of the e.’s of the given expression 
Therefore, it follows that if we call the mean of the grouped-distribution 
in which £, is in the rth (r = 1, 2, - • , k) position of a class, then 

k 

i-Mi — ■ Ai = : (r = 1, 2, • ■ • , /c). 

If we define 

k 

N = 2/ and 4> r = £ e r+,-\ 

i=i 

then, 

= 

Thus, it- is evident that jb is a function of the frequencies ol the variates and 
of the fit’s. The fact that the values of the variates do not enter permits 
one to quickly calculate its value. 

Consider for the distribution of Table III. We find 

<bt = 33fii + 13e 2 -f- l&ez. 

Then, by successive cyclic permutations of the e»’s, 

4>z — 33e 2 d" 13e 3 + 14ei, 

4>s = 33ea d~ 13ei d" 14e 2 • 

Substituting the values ei = 1, e 2 = 0, e 3 = —1 we have <f> 1 = 19, $2 = 1 and 
fa = —20. Therefore, 

254 

M2m ‘ (60) 2 

which is identical with the value which was found when Table V was used. 

It follows from the preceding development that 

^ = kN‘ 5 

and if Ft = F t = • • = F h then is zero. 

8. Conclusion. The results of this paper include: 

1. The derivation of general and specific formulas for the expected values of 
population moment-functions. 



FINITE POPULATIONS 


333 


2 The derivation of generalized sampling formulas under the condition that 
samples of n arc formed by selecting one variate from each population 

3. Methods for the transformation of generalized sampling foimulas into the 
corresponding infinite and finite sampling formulas. 

4. A method foi the transformation of infinite sampling formulas into the 
corresponding finite sampling formulas 

5. A demonstration of the fact that adjustment formulas for moment-function 
of grouped data involve central moments of a rectangulai distribution 

6. A general formula for the expected value of the 1th moment of grouped data 

7. New adjustment formulas for central moments of grouped data 

8. New adjustment formulas for Tlnclc seminvariants of grouped data 

9. A method for the evaluation of the term g 2>1 which appears in the precise 
adjustment formula for the variance. 

Many thanks arc due Prof. P S. Dwyer, to whom the writer is greatly in¬ 
debted for advice and encouragement 
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THE ANALYSIS OF VARIANCE WHEN EXPERIMENTAL ERRORS 
FOLLOW THE POISSON OR BINOMIAL LAWS 

By W. G. Cochean 

1. Introduction. The use of transformations has recently been discussed by 
several writers [1], [2], [3], [4], in applying the analysis of variance to experi¬ 
mental data where there is reason to suspect that the experimental errors are 
not normally distributed Two types of transformations appear to be coming 
into fairly common use: y/% and sin' 1 V %• The former is considered appro¬ 
priate where the data are small integers whose experimental errors follow the 
Poisson law, while the latter applies to fractions or percentages derived from 
the ratio of two small integers, where the experimental errors follow the binomial 
frequency distribution. In each case the object of the transformation is to put 
the data on a scale in which the experimental variance is approximately the 
same on all plots, so that all plots may be used in estimating the standard error 
of any treatment comparison. The extent to which these transformations are 
likely to succeed in so doing has been examined by Bartlett [2], The object of 
the present paper is to discuss the theoretical basis for these transformations in 
more detail, and in particular to examine their relation to a more exact analysis. 

2. Experimental variation of the Poisson type. The first step in an exact 
statistical analysis of the results of any field experiment, is to specify in mathe¬ 
matical terms (1) how the expected values on each plot are obtained in terms of 
unknown parameters representing the treatment and block (or row and column) 
effects (2) how the observed values on the plots vary about the expected values. 
In this section, the variation is assumed to follow the Poisson law. 

The specification of the expected values requires some consideration. In the 
standard theory of the analysis of variance, treatment and block (or row and 
column) effects are assumed to he additive. In the case of a Latin square, for 
example, the expected yield of the ith plot, which receives the /th treatment 
and occurs in the rth row and the cth column is written 

(1) m, - G + T, + R r + C, 

where G is a parameter representing the average level of yield in the experiment, 
and T t , R r and C c represent the respective effects of the treatment, row and 
column to which the plot corresponds. Since the T, R and C constants are 
required only to measure differences between different treatments, rows and 
columns, we may put 

(2) £ T t = £ Rr = £ C c = 0. 

t r c 
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If the expeiimenlal errors are normally and independently distributed with 
equal variance, this specification leads to very simple equations of estimation 
for the unknown parameters, the maximum likelihood estimate of T, , for 
example, being the difference between the mean yield of all plots receiving that 
treatment and the general mean, In addition to its simplicity, this type of 
prediction formula is fairly suitable for general use, because it gives a good 
approximation to most types of law which might be envisaged, provided that 
row and column differences are small in relation to the mean yield. Houevei, 
in considering an exact analysis with Poisson variation, the prediction formula 
is assumed chosen, without reference to computational simplicity, as being the 
most suitable to describe the combined actions of treatment, and soil effects 
The probability of obtaining a given set of plot yields x t with expectations m, 
may be written 

t a - ,! 

Thus L, the logarithm of the likelihood, is given by 

(3) L = £ (*, log m, - m,) - £ log x,i 

t l 

Hence the maximum likelihood equation of estimation for any parametci 0 
assumes the form 

(4) s fol d 3 = 0 

TO, 90 


where the summation extends over all plots whose expectations involve 0. The 


function 


dm % 

If 


will usually involve a number of parameters. 


Since the specifica¬ 


tion of row, column and treatment effects in a 6 x 6 Latin square requires 16 
independent parameters, the solution of these equations may be expected to be 
laborious, though it may be shortened by the intelligent use of iterative methods 
The problem of obtaining exact tests of significance is also difficult. Tho 
method of maximum likelihood provides estimates of the variances and co¬ 
variances of the treatment constants, which under certain conditions can be 
assumed to be normally distributed if there is sufficient replication, but this can 
hardly be considered an exact “small sample” solution 
These remarks show that the exact solution is somewhat too complicated for 
frequent use. The difficulty arises principally because the typical equation of 
estimation consists of a weighted sum of the deviations of the observed from the 

expected values, the weights being — ~ . The factor — ivas introduced into 

to, 90 TO,, 

the weight by the Poisson variation of the experimental errors, and must be 
retained in any theory which claims to apply to Poisson variation It is, how¬ 
ever, worth considering lvhether some simplification cannot be introduced into 
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the equations by assuming some particular form for the prediction formula. 
This hue of approach seems promising when one considers the simplification 
introduced into the “normal theory” case by assuming the prediction formula 
to be linear 

For Poisson variation, the linear law does not appear to be particularly suit¬ 
able, since it may give negative expectations on some plots (as happens in the 

numerical example considered in the next section) Further, while —* becomes 

SB 

a constant, the factor — remains in the weight 
mi 

The entire weight can be made constant by assuming a linear prediction 
formula m the square roots and transforming the data to square roots. For a 
Latin square, this prediction formula is written 

(5) Vm, = a, = G + Ti + R r + C c , 
where 

(6) £ T t = Z Rr = Z Co = 0 

t T C 


To find the maximum value of (3) subject to the restrictions (6), we may use the 
method of undetermined multipliers, maximizing 

(7) h + x(Er,) +»(ZRr) +v(ZC'). 

t J c 


The equation of estimation for a typical treatment constant T t becomes 


( 8 ) 


( z, — m\ dm t 
Wi / da t 


doti 

dTt 


-|- X — 0, i e 


2(a\ — ro,) 

Vm. 


+ X = o, 


the summation being extended over all plots receiving the treatment 
o, = V-L , then bv Taylor’s theorem 


(9) 


x, — m, = (a, — a,) + yi (a. 

aa, 2' 


\ 2 d wi, , 
«>) yv -T 
Cla- 


If 


If 7 ni is leasonably large, only the fust teim on the right-hand side need be 
retained. When m, is small, we may use, instead of the exact square root, a 
quantity a[ defined so that 

(10) x l — m, = ( a[ — m) = 2 VwlOl — «*) 


Thus if the analysis is performed on the quantities a, instead of on the oiigmal 
data, equation (8) becomes 


2 4(gi — a,) + X — 0 
r i 


(ID 
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On substituting the expectations for a, from (5), and using (6), we obtain 

(12) E 4(a( - 0 - T t ) + X = 0 

T, 

The corresponding equation for G is 


(13) 


E 4(o! -G) =0, 

i 


so that G is the general mean of the quantities a'. By adding equations (12) 
over all treatments, and comparing the total with (13), we fmd X = Q Hence 
T t is the difference between the mean yield of a' over all plots receiving T t and 
the general mean of a'. In this scale the simplicity of the “normal theory” 
equations has apparently been recovered. Actually, the quantities- a' are not 
known exactly, since 


(14) 


a 1 = a + 


(x — in) 
2 yjm 



where a is the expected value of \fx. However, this process provides a means 
of successively approximating the maximum likelihood solution, by choosing 
first approximations to the quantities a, constructing the <z'’s, solving for the 
unknown constants and hence obtaining second approximations to the expected 
values. The close relation of a' to \/x is seen by remembering one of the 
common rules for finding square roots. This consists in guessing an approxi¬ 
mate root (a), dividing x by the approximate root, and taking the mean of the 
approximate root (a) and the resulting quotient (x/a) 

The suitability of the linear prediction formula in square roots must be con¬ 
sidered in any example in which the above analysis is being employed The 
law is intermediate in its effects between the linear law and the product law m 
the original data. My experience is that it is fairly satisfactory for general use, 
(cf. [2], p. 72) An exception may occur when it is desired to test the inter¬ 
action between two treatments, both of which produce large effects. In this 
case the definition chosen for absence of interaction may not coincide at all 
closely with the definition implied in using the linear law in square roots. An 
example of this case was given in a previous paper [1]. 

In this connection it should be noted that an approximate “goodness of fit” 
test may be obtained of the validity of the assumptions made. Since the quan¬ 
tities a, enter into the equations of estimation with weight 4, the quantity 
4 E ( a > — a,) 2 is distributed approximately as x with the number of degrees 

t 

of freedom in the error term of the analysis of variance Some idea of the 
closeness of the approximation may be gathered by considering the simplest 
case in which only the mean yield is being estimated. In this case the observed 
values x are assumed to be drawn from the same Poisson distribution, and the 
sufficient statistic for the mean G is known to be 2(a\)/n. Since, however, the 
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prediction formula is here the same m square roots as in the original scale, and 
since the maximum likelihood sol ution is invariant to change of scale, the mean 
value (x of a' must be exactly as the reader may verify bv working 

any particular example. Thus 24(a' - a) 2 is found to be 2(x - xf/x, the 
usual x test for examining whether a set of values x may reasonably be assumed 
to come from the same Poisson distribution. Bv working out the exact distri¬ 
bution of 2 ( 3 :, — xY/x 111 a number of cases [5], I previously expressed the 
opinion that this quantity followed the x" distribution sufficiently closely for 
most practical uses, even for values of the mean as low as 2 This opinion has 
since been substantiated by Sukhatme, [6] who sampled this distribution for 
ni 1, 2, 3, 4, and 5 

A high value of x means either that the piediction formula is not satisfactory 
or that the experimental errors arc highei than the Poisson distribution indi¬ 
cates, or that both causes are operating These effects can sometimes be sepa¬ 
rated by examining whether the observed yields deviate from the expected 
yields in a systematic or a random manner Jf the deviation is systematic, the 
prediction formula is probably unsatisfactory. 

The type of approach used above resembles in many features the “exact" 
analysis for the probit transformation [7], The principal difference is that in 
the case of probits the transformation is made to suit the a prion prediction 
formula, which postulates that the probits arc a linear function of the dosage, 
or of the log (dosage). Thus with probits the equations of estimation still 
involve weights in the transformed scale These do not seriously complicate 
the analysis, since only two parameters require to be estimated for a given 
poison. With, however, the much greater number of parameters usually in¬ 
volved in specifying the results of a field experiment, the attractiveness of a 
solution which does not involve weighting is greatly increased 

3. Numerical example of the square root transformation. A 5 X 5 Latin 
square expenment on the effects of different soil fumigants in controlling wire- 
worms was selected as an example. The average number of wiieworms per 
plot (total of four soil samples) was just under five, Previous studies [8], [9] 
have indicated that with small numbers per sample, the distribution of numbers 
of wireworms tends to follow the Poisson law. 

The plan and yields are shown in Table I. The first two figures under the 
treatment symbols are the numbers of wireworms and their square roots respec¬ 
tively' the latter being regarded as first approximations to the values,a'. Two 
of the plots receiving treatment K gave no wireworms. Since these plots are 
likely to be changed most in the transition from square roots to a', better 
approximations were estimated for them before proceeding with the calculations 
The best simple approximations appeared to be obtained from the square roots 
of the means in the original units. For the plot in the second row and second 
column, the square roots of the row, column and treatment means in the original 
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TABLE I 


Plan and number of mreworms per plot 


p 

0 

N 

K 

M 

Mean 

3 1 

2 

5 

1 

4 


1.73 2 

1.41 

2.24 

1.00 

2.00 

1.676 2 

1.76 s 

1 45 

2.25 

1.11 

2.00 

1.714 3 

1.77“ 

1.46 

2.25 

1 10 

2.00 

1.716 4 

M 

K 

0 

N 

P 


6 

0 

6 

4 

4 


2.45 

(0.39) 

2.45 

2.00 

2.00 

1.858 

2 45 

0.32 

2.50 

2 02 

2.02 

1.862 

2.46 

0 32 

2.49 

2.02 

2.02 

1 862 

0 

M 

Ii 

P 

N 


4 

9 

1 

6 

5 


2.00 

3.00 

1 00 

2 45 

2.24 

2.138 

2.10 

3 09 

1.00 

2.47 

2.25 

2.182 

2 13 

3.08 

1.00 

2.46 

2.25 

2.184 

N 

P 

M 

O 

K 


17 

8 

8 

9 

0 


4.12 

2.83 

2.83 

3.00 

(0 79) 

2.714 

4.18 

2.84 

2.83 

3.00 

0 77 

2.724 

4.17 

2.84 

2.83 

3 00 

0.77 

2 722 

K 

N 

P 

M 

0 


4 

4 

2 

4 

8 


2.00 

2.00 

1.41 

2.00 

2 83 

2 048 

2.14 

2.02 

1.49 

2.04 

2.92 

2.122 

2.10 

2 03 

1 50 

2.05 

2.90 

2.116 

Mean 2 460 2 

1.926 

1.986 

2.090 

1.972 

2.087 2 

2.526 s 

1.944 

2.014 

2.128 

1 992 

2.121 3 

2.526 4 

1 946 

2.014 

2.126 

1.988 




Treatment Means 



K 

P 

0 

M 

N , 


1.036 2 

2.084 

2.338 

2.456 

2.520 


1.068 s 

2.116 

2,394 

2.482 

2.544 


1.058 4 

2.118 

2.396 

2.484 

2 544 


1 Original numbers 

“Square roots. “Second approximations 4 Third approxima- 


tions. 
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units are respectively 2.000, 2 145 and 1 095, and the square root of the general 
mean is 2 227 Hence 

a ' = -M2.000 + 2.145 + 1.095 - 2(2.227)] = 0.39. 

The otiier zero value was similarly found to give a' = 0.79. The corresponding 
estimates from the means of the square roots were considerably too low, since 
the a ' values tend to bo higher than the square roots The use of “missing plot” 
technique gave very poor approximations, because it ignores the fact that the 
plots in question had zero yields 

With the estimated values inserted, the row, column, and treatment means 
of the square roots are as shown in Table I A second approximation to a 1 
was calculated for each plot For the plot in the first row and the first column, 
the expected yield is 

a = 1.676 + 2.460 + 2 084 - 2(2 087) = 2,046 

Hence a' = J(2.046 T 3/2.046) = 1.76. These values constitute the third set 
of figures in Table I. Theoretically, it is advisable to readjust the row, column, 
and treatment means after each new value of a ' has been obtained, in order to 
secure rapid convergence This is rather laborious in practice, and a complete 
set of new plot values was obtained before readjusting the means. The third 
approximations obtained by this method are shown in the fourth lines in Table I 
and are correct to two decimal places 
It is noteworthy how closely the square roots agree with the third approxi¬ 
mations on all plots except those which originally gave zero yields. The differ¬ 
ences between the second and third approximations are trivial 
The next step is to make a x test by means of the quantity 42 (a' — a) 2 . 
From the manner in which the values a are constructed from the a"s, it follows 
that 2 (a' — a) 2 is simply the error sum of squares in the conventional analysis 
of variance of the values a'. The analysis of variance of the third approxi¬ 
mations is shown in Table II. 

TABLE II 


Analysis of variance of adjusted square roots 



Degrees of freedom 

Sum of squares 

Mean square 

Rows 

4 

2.9815 


Columns 

4 

1 1190 


Treatments 

4 

7.5815 

1.8954 

Error 

12 

4.5970 

0.3831 


The value of x 2 is 4 X 4 597 = 18 39, with 12 degrees of freedom, which is 
just about the 10 percent level. If the hypothesis is regarded as disproved 
only when x 2 exceeds the 5 percent level, the treatment means may be tested 
by regarding them as approximately normally distributed with variance 
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1/5 X 0.25 — 0.05. It is, however, more prudent to use the actual error mean 
square as an estimate of the experimental error variance, performing the usual 
tests associated with the analysis of variance. This may be justified on the 
grounds that the calculations have produced a set of plot values a' of equal 
weight. On this basis the standard error of a treatment mean is \/0.3831/6 = 
0.2768. Treatment K reduced the number of wireworms significantly below 
all other treatments, but there is no indication of any difference between the 
other treatments The treatment means may be reconverted to the original 
units by squaring. 


4. Experimental variation of the binomial type. In this ease the yields are 
obtained by examining a constant number n units per plot and noting those 
which possess a certain attribute (e.g., plants which are diseased). Experi¬ 
mental variation is presumed to arise solely from the binomial variation of the 
observed fraction p possessing the attribute about the expected fraction P, which 
is specified in terms of unknown parameters representing the treatment and 
soil effects. 

If r, is the number possessing the attribute on a typical plot, so that p, = i\/n 
the likelihood function takes the form 


n 


n 1 

rj(a — r,)! 


p\'Qr" 


Hence the terms in the logarithm which involve the unknown parameters are 
given by 


(15) L = 2 in log P, + in - r,) log Q,}. 

I 


■ The equation of estimation for a typical constant 0 is 

< 16 > -«5- # 

where the summation is over all plots whose expectations involve 0 
As in the Poisson case, an exact solution is laborious because of the weights 
71/ dP 

The unequal weighting may be removed by transforming to the 

T|| 00 

variate a, = sin -1 \z r P i , and assuming that the prediction formula is linear 
in the transformed scale For a Latin square the prediction formula is assumed 
to bo 


(17) <*, = (? + T t + Rr + Cc 

where the ffh plot receives treatment l and lies in the rth row and cth column. 
Further 


( 18 ) 


L T t = E Rr = Z C, = 0. 



ANALYSIS OF VARIANCE 


343 


Since P, = sin 2 a,, — 1 = 2\/P, Q,. A set of variates o' is defined so that 
aa t 

on each plot 

(19) p. — P« = (al - «,) = 2 VP, Q, (a! — «i). 

With these substitutions, the equation of estimation for T t , for instance, 
becomes 

(20) 2 4n(a, — a,) + X = 0 

Ti 

where, as before, X is an undetermined multiplier. The remaindei of the solu¬ 
tion proceeds exactly as in the Poisson case, T t being found to be the difference 
between the mean value of o, over all plots receiving this treatment and the 
general mean of a[ A x test may be made with 2 4 n(a[ — a,) 2 . 

X 

From (19) 

(21) a, = a, + 2x/P% ( ' P ‘ ~~ P,) = + 2 V / P, (X ~ ^ 

(22) = a, -f- £ cot a, - g, cosec (2a.) 

where g, is the observed fraction which does not possess the attribute. The 
calculation of approximations to a[ thus involves finding a predicted value a, 
from the treatment and block (or row and column) means, and using equation 

(22) . Tables [10] of the values of sin -1 \/P* > «* + I cot a,, and cosec (2a,) 
have been prepared to facilitate the computations. It should be noted that 
these tables are in degrees, whereas the above equations assume that a, is 
measured m radians. In degrees, equation (20) above becomes 

(23) Z - a,) = 0 

K J Ti 8100 * ' 

while 

180 

(24) a[ = a, + — cot a, - g, cosec (2a,)]. 

T 

As in the Poisson case, the appropriateness of the linearly additive law m 
equivalent angles depends on the way in which treatment and soil effects operate. 
As Bliss has shown [11], the effect of the transformation is to flatten out the 
cumulative normal frequency distribution, extending the range ovr~ " r hich it 
can be approximated by a straight line. 

5. Numerical example of the angular transformation. The data were selected 
from a randomized blocks experiment by Carruth [12] on the control by me¬ 
chanical and insecticidal methods of damage due to corn ear worm larvae. 
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The control and the six types oi mechanical protection were chosen for analysis, 
the “yields” being the percentages of ears unfit for sale. The numbers of ears 
varied somewhat from plot to plot, the average being 36 5, but the variations 
were fairly small and appeared to be random. It was considered that varia¬ 
tions in the weight (4 n) could be ignored m solving the equations of estimation. 

TABLE III 


Percentages of unfit ears of corn 


Treatments 

I 

II 

Blocks 

III IV 

V 

VI 

Means 


42.4 1 

34 3 

24.1 

39.5 

55.5 

49.1 


1 

40.6 2 

35.8 

29 4 

38.9 

48.2 

44.5 

39.57 2 


40.7 3 

36.0 

29.4 

38.9 

48.6 

44.6 

39 70 3 


23.5 

15.1 

11 8 

9 4 

31.7 

15.9 


2 

29.0 

22.9 

20 1 

17.9 

34.3 

23.5 

24.62 


29.1 

23.1 

20.3 

18.2 

34 3 

23.5 

24.75 


33 3 

33.3 

5.0 

26.3 

30.2 

28.6 


3 

35 2 

35.2 

12.9 

30.9 

33.3 

32.3 

29.97 


35.5 

35.3 

14.5 

31 0 

33.4 

32.4 

30.35 


11.4 

13.5 

2.5 

16 6 

39.4 

11.1 


4 

19.7 

21.6 

9.1 

24.0 

38.9 

19.5 • 

22.13 


19.8 

21.7 

10.0 

24.4 

39 9 

19.6 

22.57 


14 3 

29.0 

10.8 

21.9 

30.8 

15.0 


5 

22.2 

32.6 

19.2 

27.9 

33.7 

22.8 

26.40 


22.6 

32.7 

19.2 

28.0 

33.7 

22.9 

26.52 


8.5 

21 9 

6 2 

16.0 

13.5 

15.4 


6 

17.0 

27.9 

14.4 

23.6 

21.6 

23.1 

21.27 


17.4 

28.2 

14.5 

24.0 

22.1 

23.2 

21 57 


16.6 

19.3 

16.6 

2 1 

11.1 

11.1 


7 

24.0 

26.1 

24 0 

8.3 

19.5 

19.5 

20.23 


24.3 

26.2 

28.8 

10.9 

20.1 

19.5 

21.63 

Means 

26 81 2 

28.87 

18.44 

24.50 

32.79 

26.46 

26.31 


1 Percentage, 2 Equivalent angle 3 Second approximation. 

The percentages of unfit ears, the equivalent angles and the second approxi¬ 
mations to a' are shown in descending order in Table III. The percentages on 
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individual plots vary from 2.1 to 55.5, The second approximations were calcu¬ 
lated from the block and treatment means of the angles For the control plot 
(treatment 1) in block I, for example, the expected value is 

39.57 + 26.81 - 26.31 = 40.07. 

Since Fisher and Yates’s tables of a + | cot a and cosec (2a) are given for 
values of a from 45° to 90°, we take the complement of the expected value, 
which is 49.93 Interpolating mentally from the table, we find 

a -j- \ cot a = 74.0, cosec (2a) = 58.3 

Thus the second approximation to the complement of the angle is 

74.0 - 0.424 X 58.3 = 49.3. 

Hence the second approximation to a' is 40.7, which agrees very closely with 
the equivalent angle. 

On the majority of the plots, the second approximation diff ers by only a 
trivial amount from the equivalent angle. The plots with the three lowest 
percentages (2.1, 2.5, and 5 0) have increased somewhat more, and also one or 
two other plots where the angles deviated considerably from the expected values. 
A third set of approximations was not considered necessary. 

The analysis of variance of the second approximations is given m Table IY 


TABLE IV 



Degrees of freedom 

Sum of Bquares 

Mean squares 

Blocks 

5 

709.79 


Treatments 

6 

1,531.56 

255.26 

Error 

30 

982.67 

32.76 


Taking n as 36.5, the expected value of the error mean square is 820 7/36.6 = 
22 48. Thus x — 982.67/22.48 = 43.71, with 30 degrees of freedom, which is 
almost exactly at the 5 percent level. This, together with the appreciable 
amount of the variance removed by blocks, indicates that the experimental 
error probably contains some element other than binomial variation. As in the 
preceding case, it would be wise to make the usual analysis of variance tests 
with the actual error mean square. 

6. Discussion. It must be emphasized that the solutions given above apply 
to the case where the whole of the experimental error variation is of the Poisson 
or binomial type. The methods are therefore likely to be useful in practice only 
where the experimental conditions have been carefully controlled, or where the 
data are derived from such small numbers that the Poisson or binomial variation 
is much larger than any extraneous variation, The x 2 test is helpful m deciding 
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whether this assumption is justified. Further, the examples worked above 
indicate that the transformed values form very good approximations on most 
plots, It will often be sufficient to adjust only those plots which give zero or 
very small values m the Poisson case, or zero or 100 percent values in the 
binomial case. In this connection the method of adjustment given above ma y 
perhaps be considered as an improvement on the empirical rule given by Bartlett 
[13] of counting n out of n as (n — 1/4) out of n. 

Where extraneous variation becomes important, as is probably the normal 
case with data derived from field experiments, there seem to be no theoretical 
grounds for using the adjusted values, If we were prepared to describe accu¬ 
rately the nature of the variation other than that of the Poisson or binomial 
type, a new set of maximum likelihood equations could be developed. These 
would, however, lead to a different type of adjustment. 

The justification for the use of transformations has no direct relation to the 
Poisson or binomial laws in this case, or in cases where percentages are derived 
from the ratios of two weights or volumes, as in chemical analyses, or from an 
arbitrary observational scoring With percentages, for example, it may be 
said, without describing the experimental variation in detail, that the variance 
must vanish at zero and 100 percent and is likely to be greatest in the middle. 
The formula V = \PQ is at least a first approximation to this situation. The 
angular transformation will approximately equalize a distribution of variances 
of this type, provided that X is sufficiently small. We have, of course, returned 
to an “approximate” type of argument. It follows that the original data should 
be scrutinized carefully before deciding that a transformation is necessary and 
that any presumed opinions about the nature of the experimental variation 
should be verified as far as possible. 

7. Summary. This paper discusses the theoretical basis for the use of the 
square root and inverse sine transformations in analyzing data whose experi¬ 
mental errors follow the Poisson and binomial frequency laws respectively. 

The maximum likelihood equations of estimation are developed for each case, 
but are in general too complicated for frequent use. If,' however, the expected 
yield of any plot is assumed to be an additive function of the treatment and 
soil effects in the transformed scale, a transformation can be found so that the 
equations of estimation assume the simple “normal theory” form. The trans¬ 
forms are closely related to the square roots and inverse sines respectively. 

The nature of the assumed formula for the expected values is briefly discussed, 
and a % test is developed for the combined hypotheses that the prediction 
formula is satisfactory and that the experimental errors follow the assumed law. 

Numerical examples are worked for both types of transformation. These 
indicate that even for data derived from small numbers, the square roots or 
inverse sines are good estimates of the correct transforms on almost all plots, 
except those which give zero yields in the Poisson case, or percentages near 
zero or 100 in the binomial case. 
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In practice, these new methods are not recommended to supplant the simple 
transformations for general use, because it can seldom be assumed that the 
whole of the experimental error variation follows the Poisson or binomial laws. 
The more exact analysis may, however, be useful (i) for cases in which the plot 
yields are very small integers or the ratios of very small integers (w) in showing 
how to give proper weight to an occasional zero plot yield, 

REFERENCES 

[1] W (i, Cochran, “Some difficulties in the statistical analysis of replicated experi¬ 

ments," Empire J Expl. Agnc., Vol, 6 (1938), pp 157-75 

[2] M 8 Bartlett, “The squaie loot transformation in the analysis of variance, 1 ' 

J Roy Stat Soc Suppl , Vol, 3 (1936), pp 68-78 

[3] 0 1 Bliss, “The transformation of percentages foi use in the analysis of variance,” 

Ohio J. Set , Vol 38 (1938), pp 9-12 

[4] A Clark and W H Leonard, "The analysis of vaiiance with special reference to 

data expressed as percentages,” J Amer Soc, Agron,, Vol 31 (1939), pp 55-56 

[5] W G Cochran, “The x 1 distribution for the Binomial and Poisson senes, with small 

expectations,” Ann. Eugen , Vol. 7 (1936), pp 207-17, 

[6] P, V Sukhatme, “On the distribution of x 1 in samples of the Poisson scries,” J Roy 

Slai Soc Sup-pl,, Vol 5 (1938), pp 75-9 

[7] C I Bliss, “The deteinnnntion of the dosage-mortality curve from small numbers,” 

Quart J, Pharmacy and Pharmacology, Vol 11 (1938), pp, 192-216 

[8] A. W Jones, "Practical field methods of sampling soil for wireworms,” J Agnc, Res , 

Vol 54 (1937),’pp 123-34 

[9] W, G Cochran, “The information supplied by the sampling results,” Ann App 

Biol , Vol 25 (1938), pp 383-9 

[10] R A Fisher and F Yates, Statistical tables for agricultural, biological and medical 

research, Edinburgh, Oliver and Boyd, 1938 

[11] C I Buss, “The analysis of field experimental data expressed in percentages," Plant 

Protection (Leningrad), 1937, pp 67-77, 

[12] L. A, Carruth, "Experiments for the control of larvae of Hekothis Obsoleta Fabr,” 

J Econ Ent , Vol 29 (1936), pp 205-9. 

[13] M S, Bartlett, “Some examples of statistical methods of research in agriculture 

and applied biology,” J Roy, Slat Soc. Suppl , Vol 4 (1937), p 168, footnote. 


Iowa State College, 
Ames, Iowa 



NOTES 

This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


ORTHOGONAL POLYNOMIALS APPLIED TO LEAST SQUARE FITTING 
OF WEIGHTED OBSERVATIONS 

By Beadfoed F. Kimball 

1. Introduction. Let the independent variable be denoted by x, and let it 
range over n consecutive integral values xi to x„ Thus x represents the 
index-number of the ordered intervals at which observations are taken, where 
the intervals are all of equal length, and an index-number is assigned in con¬ 
secutive order to every interval within the range of investigation, whether ob¬ 
servations occur m that interval or not. Let y x denote the observation measure 
(usually referred to as observed value), if such observation exists. Let w x denote 
the weight of that observation, with weight zero assigned where observations 
are lacking. 

To shorten the notation, summation over all values of x from Xi to x„ will be 
denoted by the sign 2. If a subscript and superscript is used, the context will 
indicate the variable to which the summation refers, The ?-th binomial coeffi¬ 


cient will be denoted by 


0 


A system of polynomials <f>,(x), r = 0,1, 2, 3, ■ • - of degree r in x is said to be 
an orthogonal system, for the purposes of this paper, if they satisfy the relations 


( 1 ) 


2 W z <t>r{x)4> s {x) 


1=0, r 5* s 

1^0, r = s. 

To construct the polynomials, one may write them in the form 
<j>o(x) = / 0 (x) = constant 


( 2 ) 


r—1 


~ fr{x) ~ 2 K 

t-0 


r = 1, 2, 3, 


where the hi are constants and the / r (x) are arbitrary polynomials of degree r. 
It then follows from the conditions of orthogonality that 


(3) 


, _ 2 w x f r (x)<t,fx) 
2«>iWi)f 
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Thus when the polynomials f T (x) have been chosen for all r, the system of 
orthogonal polynomials for a given set of weights can be constructed and is 
uniquely determined except for a constant factor [1] 

By virtue of the relation (2) and the conditions of orthogonality (1), it follows 
that 

(4) 'Sw x [(j> r (x)f = 2w x f r (x)4, r (x). 

Define the function 4>(r, k) by 

(5) 4>(r, k ) = 2w x f r {x)cf»'(x), r = 0, 1„ 2, 3, 

It follows from the relations (2) and (3) that 

(6) 4>,-(x) = f r (x) - £ <t>i{x) 

t-O Q(l, l) 

where it is to be rioted that this summation is independent of x. 

Define q r and Y r by 

(7) q r = 2w x [<l>r(x)f = Zwjrix^rix) = 4>(r, r), 

( 8 ) Y t = 'Zw x y x <i> r (x). 

Then if u r (x) represents the polynomial solution of degree r of the noimal equa¬ 
tions set up for observed values y x and weights w x , 

( 9 ) Ur(,X ) = — + — <Pl(x) + ? <pi(x) +,•■•, + — <f>r(x). 

qo qi qi q r 

If E 1 denotes the weighted sum of the squares of the discrepancies between 
the ordinates u r (x) of the fitted curve and the observed values y x , then [2], 

(10) E 1 = 23 w*[wr(i) - - 23 w x yl — 23 — • 

i~o g. 

The practicability of the use of orthogonal polynomials is thus seen to depend 
upon whether the quantities $(r, k) and Y r can be evaluated in a reasonably 
simple manner. 

The thesis of this paper is that if f r (x ) is taken as the binomial coefficient 

one can effectively apply the method of orthogonal polynomials. This is made 
possible by the use of factorial moments in conjunction with an adding machine 
that prints cumulative totals. 

In treating the same problem Aitken sets up the normal equations in terms 
of factorials, but considers the explicit use of orthogonal polynomials imprac¬ 
tical. He writes: “the arbitrary nature of the weights stands in the way of 
any analytical sophistication; orthogonal polynomials emerge, but are not of 
great use; and the necessity of solving the moment equations cannot be circum¬ 
vented” [3]. He prefers a determinantal method of solution of the normal 
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equations which the writer has found to be more involved from a practical point 
of view, than the present method, although it is elegant from a theoretical 
standpoint. 

Thus although the present method is not new from the point of view of 
theory, the writer has found that forms made up by the use of the technique 
suggested below, offer an effective method for fitting polynomial curves to 
weighted observations 


2. Simplification of the problem when f r (x) 



Factorial moments S, 


and M r are defined by 



These moments are not difficult to compute and are readily checked as com¬ 
puted. Formula for $(r, k) then becomes 


( 12 ) 

Thus since <j>a(x) 


= 1 


k) = 2 ^ Wrtkix). 

, *(r, 0) = 2 (*) w t = S r and hence 


- (0 


Si 

8t‘ 


Again 


$(r, 1) = 2 




= (r + l)Sr+i + rS T - 


Hence 


a-#(i, i) = 2S, + (i 

A recursion formula for $(r, k) may be obtained by expanding fa(x) in formula 
(12) by means of (6). Thus 


(13) 


$(r, k) 




y-l $(r, i)$(k t i) 
l-o q. 



The first term can be easily expressed as a linear combination of binomial coeffi¬ 
cients, and thus as a linear combination of moments S ,. 
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The formula for F r can be broken down as follows: 

Fo = 22 W *V‘ = 

Yr='E Wxy*4>r(x) = 22 w *y*(f} - S [53 WxVxtti a;)] 

= Mf -g^F.. 

.-o g< 


Thus 


Y 1 = M 1 ~^Y 0l 

OQ 

F, = Mi- Fi - ^F 0) etc. 

gi go 


3. General technique of computation. In determining the beBt fitting poly¬ 
nomial of degree r, the ratios 3>(r, i)/q t are seen to play an important part. 
In a form for calculation, these quantities should receive simple designations 
such as 5, for a second degree curve, e, for a third degree curve, etc. Suppose 
they are designated by fl, for a curve of degree r; then 



(16) Y r = M r -Ei2.Fi 

1-0 


(17) q r = £ w x - £ RM r , i) 

and in determining 3>(r, k) for k = 0, 1, 2, • • r — 1, formula (13) may be 
written. 

(18) $(r, k) = 2 w * ~ 2 RMk, i )• 

The fact that these quantities R{ appear as multipliers in so many of the 
fundamental formulas greatly simplifies the mechanics of the calculation, espe¬ 
cially when a calculating machine is used. 

In final determination of polynomial curve the differences of the polynomial 
at x = 0 are readily determined since the leading term of each orthogonal 
polynomial is a binomial coefficient and thus 

r —1 

A k <f>,(0) = - Z fl.A‘<fr(0), 

4-0 

A r *r(0) - 1. 


(19) 


fc = 1,2,3, — , r — 1 
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Since the effectiveness of the method depends upon the availability of ai 
adding machine which records a cumulative subtotal, the determination of tht 
curve from the differences at the point » = 0 is not a hardship and indeed 
affords a quick and accurate means of setting up the curve for purposes ol 
plotting and checking. 

Wr(0) = — + — $i(0) + —$ 2 ( 0 ) + )■■•) + — <£r(0), 

go gi g_2 g? 

(20) A l u r (0) = — + — A k <p L+ i + + — A l ^(0), 

Qk (Jk+1 Sr 

A r M r (0) = 

5r 

The advantage of the use of orthogonal polynomials becomes particularly 
apparent when error formulae are to be used The formula for the sum of ,the 
squares of the discrepancies, denoted by E 2 , is given above (formula (10)), 
The estimated variance V of the weighted observations about the fitted curve 
is thus E 2 /(n - 1 - 1) whoie n is the number of values of x used in fitting 
and r is the degree of the curve fitted. Recalling that the matrix of the normal 
equations is of the diagonal form with diagonal elements q 0 , qi, ■ • ■ , q r it 
follows that the coefficient Yi/qt, of <fo(a?) in the expansion of u r {%) has the 
variance VJqk. 

Furthermore the variance of the ordinate of the fitted curve u r (x) at a point x 
due to sampling variations in the determination of the coefficients of the curve, 
under the assumption that the weights and values of the independent variable x 
do not involve errors, has the simple form 


Variance of u r (x) r-.a, v , 2 , s 

(21) at points = v| ^ + , ... , + 

L go qi 

since the covariances of the orthogonal polynomials are zero [4], 


Qr J 
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COMBINATORIAL FORMULAS FOR THE rth STANDARD MOMENT 
OF THE SAMPLE SUM, OF THE SAMPLE MEAN, 

AND OF THE NORMAL CURVE 

By P. S. Dwyer 


The standard moments of the normal curve are usually expressed by the two 
statements [1, p. 97] 


( 1 ) 


_ (2s)!] 
" 2a 2*s! 

<*28+1 = 0 


pairs 


It is of some interest to note that these two statements may be generalized into 

a single statement by observing that ~~ is the number of ways in which 2s 

2*sl 

things can be grouped in pairs and that 0 is the number of ways m which 2s + 1 
things can be grouped in pairs. It is obvious that an odd number of things 
can not be grouped in pairs since there must be at least one unpaired unit. It 
is clear, too, that the number of orders in which 2s things can be grouped in 

is ( 2 2 S )( 2S 2" 2 )( 2s 2 4 ) • (2X2) and thls is ^ ■ Hweverif the 

resulting paired groups (rather than the orders of grouping) are counted it is 

seen that each paired grouping is repeated s’ times so that repiesents the 

number of ways 2s things can be grouped in pairs. If we arbitrarily define the 
number of ways 0 things can be grouped in pairs to be 1 (or if we limit our 
theorem to values of r > 0) we may say “The rth standard moment of the 
normal curve is equal to the number of ways in which r things can be grouped 
in pairs.” 

As presented above the combination representation is used primarily as a 
means of unification of results However, it is possible to derive the standard 

moments of the normal curve in such a way as to indicate the term —y early 

in the proof and to trace it throughout the proof. I follow the method outlined 
by H. C. Carver [2] in obtaining the normal distribution as the limit of the 
distribution of sample sums (or of sample means) though I use a somewhat 

different notation [3, p. 6]. If we let 1 represent the number of 

ways in which r units can be collected with n groups containing pi units, n 
groups containing p 2 units, etc., then the multinomial theorem can be expressed 
as [3, p. 17] 


( 2 ) 
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where the summation is taken over all possible partitions p?' ■ ■ ■ pi’ of r and 
the expression (pf 1 • pi') represents the power product form [3, p 14] which 
is iri!ir 2 ! • • ■ jt,! times the monomial symmetric function. If p represents the 
number of parts of the partition then 


while 


p = JTl 4" ^2 4" • • • "h 2T, 


r = pm •+■ pm 4- ■• ■ + Vm • 

Now it can be shown from (2) in the case of infinite sampling that 



and since pi = 0, it is only necessary to sum over all partitions which have no 
unit part. We have then, dividing by [p 2 ;d)] if = [nfef 7 



We have now a formula for the rth standard moment of the sample sum which 

is expressed essentially in combination notation since the quantity 

represents the number of ways in which r units can be grouped to form n 
groups containing p t units, tt 2 groups containing p 2 units, etc. All non-unitary 
groupings of r are formed, each combinatorial coefficient is computed and multi¬ 
plied by n ip) /n* r times the product of the corresponding a’s, and the sums are 
formed. It might be noted that the formula for the rth standard moment of 
the sample mean is identical with (4) while the corresponding finite sampling 
(without replacements) formula is 



(5) 


“r:«) 



N P P P * l. 


N ir pi r 



The P’s are defined in previous papers [2, p. 105-6][3, p. 113], 

We obtain the formula for the rth standard moment of the normal curve by 
taking the limit of (4) as n —» °c. (H. C. Carver has pointed -out [2, p 121] 
that this method of derivation imposes fewer restrictions than does the deriva¬ 
tion from Hagen’s hypothesis.) Each partition term will approach zero as n 
approaches infinity if p < \r. Now the only non-unitary partition in which 
p is not less than \r is the partition 2 tr and we can have this partition only when 
r is even. Now the limit as n approaches infinity of n (p) /nl r is unity and we 
have, in the limiting ease 


( 6 ) 


a r = 



if r is even. 


( 0 if r is odd. 
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Since 



is the number of ways r units can be grouped in pairs when r is 


even and since 0 is the number of ways r units can be grouped in pairs where 
r is odd, it follows that the rth standard moment of the normal curve is the 
number of ways in which r units can be grouped in pairs. 

This development is of interest in that it makes possible the tracing of the 


value j back through the various stages of the development to the coefficient 
of (2 ,r ) in the power product expansion of the multinomial theorem. 
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ON A METHOD OF SAMPLING' 


By E. G. Olds 


It is recorded that Diogenes fared forth with a lantern in his search for an 
honest man History docs not tell us how many dishonest men he encountered 
before he found the first honest one but, judging from the fact that he took his 
lantern, apparently he expected to have a long search. The general problem of 
sampling inspection, of which the above is a special case, can be stated as follows: 

Given a lot, of size m, containing s items of a specified kind If items are 
to be drawn without replacement until i of the s items have been drawn, how 
many drawings, on the average, will be necessary? 

Uspensky 2 has solved a problem concerning balls in an urn, from which the 
answer to the above question can be obtained for the special case i — 1. For 
the general case, the distribution for the number n of the drawing in which the 
ith specified item appears, is given by terms of the series: 


(1) 


m — j+i n 
/ V* O- 

VQ = 2^ 

n*=*l 


n—l,v 


,-1 c„ 


C m , 


= £ 


'n—l,t—1 


Cm— n,a- 


Cm, 


1 Presented to The Institute of Mathematical Statistics, Dec 27,1938, at Detroit, Mich , 
as part of a paper, entitled "Remarks on two methods of sampling inspection.” 

S J, V. Uspensky, Introduction to Mathematical Probability, McGraw-Hill, New York, 
1937, p. 178. 
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where the first symbol indicates the number of ways of choosing i 1 of the 
specified items to fill the first n — 1 places, the second symbol indicates the 
number of ways of disposing of s — 1 specified items in the last m — n places, 
and the denominator gives the number of ways that the s items can be scattered 
through the lot. In order to get the average number of draws we multiply 
by n and sum. Then we have 

r~\ * \ ' TlC 71 — 1 , 1—1 Cm - _ i(m + 1) y Cr.jCn-T,,,-! = ^(m + 1) 

1 n~0 Cm,, S + 1 n=0 S + 1 

Example 1. On a table of 200 bargain shirts there are 5 which have a 15 in. 
neckband and 35 in. sleeves. How many shirts must be examined, on the 
average, to find two of the desired land? 

Solution. For this case, m = 100, s = 5, i = 2 Therefore n = [2(201)] 4- 
6 = 67. Thus, an average of 67 shirts must be examined. 

Suppose pk represents the Kth moment about the mean, v K the Kth moment 
about the origin, and v'k the moment relation given by 

(3) v K = ( n + K - l) w , 

where (n + K - 1) U) represents the result of expanding (r + K — l) (Kl and 
changing the exponent of v to the corresponding subscript. (For example, 
vs = (vi + 2) (3) = i/ 3 + 3r 2 + 2vi .) It is easy to derive the recurrence relation 

fA \ / ii + K — l)(m 4- K) / 

(4) V K = - _ . W- - Vrc—l ■ 


From this result the computation of the moments about the mean is theoretically 
direct. Actually the results do not seem to be very compact. The variance is 
given by 


(5) 


«2 = 


(m + l)(m — s) 
(s + l) 2 (s + 2)~ 


[i{s + 1) - I 2 ]. 


In case s is unknown and n is known for a particular value of i, we may 

i(m + 1) 


estimate s, (or rather —), 
\ s + 1/ 


by using the relation, n = 


+ 1 


Then 


( 6 ) 


s + 1 


est, = 


i(m + 1) ’ 


and the variance, using this estimate, is given by 


(7) 


Variance of 



est. 


n 1 Vn ^ 

ri- - 1 

n + »(m + 1) i{m + 1) \_i 

m + 1. 


Example 2. In order to check a box of 144 screws, screws are drawn until 
10 good screws are obtained. In a particular case only 10 drawings were neces¬ 
sary. Estimate the number of good screws in the lot. 

Solution. Here m = 144, i = 10, n = 10. The estimate for $ is obtained 
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from y s es t- — ^0(145) _ 145 anc ^ ! as m ’sht be expected, the conclusion 

is that all the screws are good. Furthermore the variance of the estimated 
quantity is zero. 

It is obvious that the number of draws necessary to obtain any particular 
number of specified items is correlated with the numbers of draws for lesser 
numbers of items. To investigate this, let us suppose that n, represents the 
number of draws to obtain exactly j specified items and that x t - n, — n,^ 
It follows immediately from our previous results, that 


(8) ■ E(x x ) = E(x 2 ) = E(x 3 ) = ... = 

s + 1 

This result could be obtained from the fact that, corresponding to any arrange¬ 
ment of the lot for which x a = a and xi = b, there is another arrangement 
where x a = b and Xt = a, formed by moving a — b of the non-specified items 
from the first group to the second. From this fact we see, also, that 

( 9 ) E(x\) = Eix\) = E{x\) =* • 

But %! = 711 and <r 2 ni = ^ f s + 1 ~ 1] = *■ 

Therefoie, 

(10) <r 2 Xl — ffs 2 = • ■ ■ = ds. 

But, from our previous formula we have 

<u t = d(2s — 2), a\, = d(3s — 6), etc. 

Since n 2 = Xi + x 2 , it follows that 

Vn 2 = C Xl + | S 2 0‘x 1 £T a ; 2 + <r *j 

where r IJiIS is the correlation between X\ and x 2 . Therefore, 

( 11 ) fx 1 ,x i = —1/s. 


Also, since x x = n 2 — x 2 , it follows that 

(12) = ]/ s ~i k - 

Likewise, from x 2 = n 2 — x x , we get 

(13) r "^> = jAir- 

Finally, we obtain the three general results 





i 

s(s — i + 1) 


J 


(14) 
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(15) 

(16) 


s -t + 1 
si ’ 


r = «/ »( 6 -»T 

n * II,n * y (t+i)(s-» 


+ i)' 


Example 3. The cards of a deck are turned one by one until two aces have 

appeared The second ace appears when the 36th card is turned. How many 

more cards should one expect to have to turn to find a third ace? 

Solution. Here m — 52, s = 4, i = 2, = 36. 

„ 53 . 53 , J / 2 V6 

Then fh 2 ■ „ , ^ , and r„ Jlia _ 2 x) — g~ • Also 

<r X3 = V4d and <r nj = \/6d- Since —-— = r n2ils —-—, we have 

Cfzj 0"n 2 



2 -y/6 

Ve' 6 



17 
3 ‘ 


Of course this result could have been obtained more directly by noting that 
there were two aces left among the 16 remaining cards. 


Conclusion. The results given in this note might be useful when it is neces¬ 
sary to estimate the number of items to be drawn in order to secure a desired 
number of a particular type, such as may be the case in obtaining a sample 
with previously defined characteristics. Also the note disproves such intuitive 
notions as the one that when looking for a desired record, one is most likely to 
have to search the whole pile to find it. As far as methods of sampling inspec¬ 
tion are concerned, the one implied in this note has little to recommend it. 

Carnegie Institute of Technology, 

Pittsburgh, Pa 


RANK CORRELATION WHEN THERE ARE EQUAL VARIATES 1 

By Max A. Woodbury 
If there is given a set of number pairs 

(1) (Xi, Ki), (X,, y 2 ), • ■, (An , Fn), 

we may assign to each variate its “rank” (i.e. one more than the number of 
corresponding variates in the set greater than the given variate) In this way 
there is obtained a set of pairs of ranks 

( 2 ) (*i, Vi), (*i, Vz), ■ , (*j v, Vn). 

‘Presented at the fall meeting, Mich, section of the Math. Assn of America, Nov. 18, 
1939, Kalamazoo College 
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If we assume that X t ^ X, and F, M F, when i ^ j then it follows lhat 
each integer from 1 to N appears once and only once in the x’s and the same 
holds for the if a This leads at once to the formulas : 

H N N 

(3a) Zz 1 = Z?A = Zi = N(N + l)/2, 

1=1 1=>1 t=l 

(3b) E £ = E y\ = E i 1 = JV(iV + 1)(2W + l)/6. 

i=*l i*=l i™l 

When these results are substituted in the expression for the product moment 
coirelation coefficient we have after simplifying [1], 

N 

(4) p = 1 — 6 E Dl/NiN* — 1) where D, = x v — y t . 

i=i 

If we consider the case of equal variates and follow the rule for assigning 
ranks given in the first paragraph, the resulting method is known as the bracket- 
rank method. The use of (4) in the calculation of p by this method is not 
strictly valid, because not every integer appears in the summations and so 
neither (3a) nor (3b) is true. 

The more accurate mid-iank method assigns to each of the equal variates 
the average of the ranks that would be assigned if we were to give them an 
arbitrary order. This method preserves (3a) but not (3b). In this paper p M 
indicates the value of p as calculated by (4) when the mid-rank method is used. 

In a method due to DuBois [2], the equal variates arc assigned the same rank 
so as to satisfy (3b). In this case (3a) is not satisfied. 

If we assign the ranks to the equal variates in an arbitrary way, then (3a) 
and (3b) are of course satisfied and the use of (4) is valid. There are two 
disadvantages to such a method; first, the equal variates are treated differently, 
and second, the assignment of ranks is arbitrary. These difficulties are removed 
if one uses the average of the values of p corresponding to all possible ways of 
arbitrarily assigning ranks to the equal variates Since p is linear in Z D\ the 

l 

average value of p may be obtained from the average value of Z and the use 

l 

of (4). 

Let us first consider the simple case of two equal variates in one of the vari¬ 
ables, say X. It is clear that there are only two possible ways of assigning 
ranks, and that if we arrange the series by the assigned x ranks, the resulting 
series differ only in the y ranks corresponding to the equal X variates. If we 
denote the two x ranks to be assigned by m and m + 1 and the y’s corresponding 
for a particular arrangement by y m and y m+ i we have for the average Z D\ the 

t 

expression 

E(Z- Vxf + Z (X - Vxf 

x=>l x—m-t-2 

+ |[(m — y m Y + (m + 1 — i/m+i) 2 + (m, — j/m+i) 2 + (m + 1 — yf)\ 


(5a) 
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By the mid-rank method the corresponding expression is 


(5b) X ( x ~ Ihf + X) ( x ~ Vxf + (m + | - y m f + (m + \ - y m+1 f. 

The conection A 2 to be added to the mid-rank X -D? to get the average X -0? is 

i t 

by subtracting (5b) from (5a) and simplifying, 

(6) A 2 = i 


To got A k in the more general case of several equal variates, we need only con¬ 
sider the difference between the average value of X -D« and that obtained by the 

i 

mid-rank method. If there are K equal X variates we may assign the ranks 
in K 1 ways, this results in K 1 permutations of the y ranks for the sets arranged 
in order of their assigned x ranks. In {K — 1)1 permutations y m+] corresponds 

to the x rank of m + i so that the correction to the mid-rank X -D? is 

i—i 


(7) 


4k- [2 2 (* + 

»“0 


Z/m-w) 2 ] - X ~ Vm+^J 


H K~l X—'1 

-i2 2 

-A- »0 !*=(} 


(m + i - y m+J f - (in + - ~ y m+ }j 


_ JC(X 2 - 1) 
12 


11 is to be noticed that the correction is positive and depends only on the number 
of equal X variates. From this it can be concluded that foi more than one 
group of equal vaiiates no matter whether X’s or Y’s we can obtain the average 
X T>\ by computing a correction for each group and then adding these correc- 

X 

lions to get the total correction to the mid-rank X 2>» Then as before noted 

t 

we can by (4) calculate the average p (denoted as jo). 

This correction to X ma y be converted into a correction to p M . That is 


ff 

( 8 ) 


6Ac, 

N{W - 1) 


K.{K\ - 1) 
2 N{N 2 - 1 ) 


then 



N,K, , 


where the summation extends over all groups of equal variates, and Ki is the 
number of equal variates in the ith group. 

A table of S N k for different values of N and K is given, and also a table of 
Ax. The values Ax are given in the top row of the table, while the are 
given in the rows below 



RANK CORRELATION 


361 


Table of A K and S^k 


V 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

Ajc 

0 5000 

2 000 

5 

10 

17.5 

28 

42 

60 

82 5 

110 

143 

182 

&NK 













3 

1250 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 


4 

0500 

2000 

— 

— 

— 

— 

— 

— 

— 

— 

— 

— 

5 

0250 

1000 

2500 

— 

— 

— 

— 

— 

— 

— 

— 

— 

6 

0143 

0571 

1429 

2857 

_ 

_ 

_ 

_ 

_ 

_ 

_ 

_ 

7 

0089 

0357 

0893 

1786 

3125 

— 

— 

— 

— 

— 

— 

— 

8 

0060 

0238 

0595 

1190 

2083 

3333 

— 

— 

— 

— 

— 

— 

9 

0042 

0166 

0417 

0833 

1458 

2333 

3500 

— 

— 

— 

— 

— 

10 

0030 

0121 

0303 

0606 

1061 

1697 

2546 

3636 

— 

— 

— 

— 

11 

0023 

0091 

0227 

0455 

0795 

1273 

1909 

2727 

3750 

_ 

_ 

__ 

12 

0017 

0070 

0176 

0350 

0612 

0979 

1469 

2098 

2885 

3846 

— 

— 

13 

0014 

0055 

0137 

0275 

0480 

0769 

1154 

1648 

2266 

3022 

3929 

— 

14 

0011 

0044 

0110 

0220 

0385 

0615 

0923 

1319 

1813 

2418 

3143 

4000 

15 

0009 

0036 

0089 

0179 

0313 

0500 

0750 

1071 

1473 

1964 

2554 

3250 

16 

0007 

0029 

0074 

0147 

0257 

Q412 

0618 

0882 

1213 

1618 

2103 

2676 

17 

0006 

0025 

0061 

0123 

0214 

0343 

0516 

0735 

1011 

1348 

1752 

2230 

18 

0005 

0021 

0052 

0103 

0181 

0289 

0433 

0619 

0851 

1135 

1476 

1878 

19 

0004 

0018 

0044 

0088 

0154 

0246 

0368 

0526 

0724 

0905 

1254 

1596 

20 

0004 

0016 

0038 

0075 

0132 

0211 

0316 

0451 

0620 

0827 

1075 

1368 

21 

0003 

0013 

0032 

0065 

0114 

0182 

0273 

0390 

0636 

0714 

0929 

1182 

f*2 

0003 

0011 

0028 

0056 

0099 

0158 

0237 

0339 

0466 

0621 

0807 

1028 

23 

0002 

0010 

0025 

0049 

0086 

0138 

0208 

0296 

0408 

0543 

0708 

0899 

24 

0002 

0009 

0022 

0043 

0076 

0122 

0183 

0261 

0359 

0478 

0622 

0791 

25 

0002 

0008 

0019 

0038 

0067 

0108 

0162 

0231 

0317 

0423 

0550 

0700 

26 

0002 

0007 

0017 

0034 

0060 

0096 

0144 

0205 

0282 

0376 

0489 

0622 

27 

0002 

0006 

0015 

0031 

0053 

0085 

0128 

0183 

0252 

0336 

0437 

0556 

28 

0001 

0005 

0014 

0027 

0048 

0077 

0115 

0164 

0226 

0301 

0391 

0498 

29 

0001 

0005 

0012 

0025 

0043 

0069 

0103 

0148 

0203 

0271 

0352 

0448 

30 

0001 

0004 

0011 

0022 

0039 

0062 

0093 

0133 

0184 

0245 

0318 

0405 

35 

0001 

0003 

0007 

0014 

0025 

0039 

0059 

0084 

0116 

0154 

0200 

0255 

40 

0000 

0002 

0005 

0009 

0016 

0026 

0039 

0056 

0077 

0103 

0134 

0171 

45 

0000 

0001 

0003 

0007 

0012 

0018 

0028 

0040 

0054 

0072 

0094 

0120 

50 

0000 

0001 

0002 

0004 

0007 

0011 

0016 

0023 

0032 

0043 

0055 

0070 

60 

0000 

0001 

0001 

0003 

0005 

0008 

0012 

0017 

0023 

0031 

0040 

0061 

70 

0000 

0000 

0001 

0002 

0003 

0005 

0007 

0010 

0014 

0019 

0025 

0032 

80 

0000 

0000 

0001 

0001 

0002 

0003 

0005 

0007 

0010 

0013 

0017 

0021 

90 

0000 

0000 

0000 

0001 

0001 

0002 

0003 

0005 

0007 

0009 

0012 

0015 

100 

0000 

0000 

0000 

0000 

0001 

0002 

0003 

0004 

0005 

0007 

0009 

0011 
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As an 

example of the 

use of the 

table we will consider the following problem 

!, p. 56^ 

|, with the ranks assigned 

as for the mid-rank 
/ 

method 

Subject 

I 

II 

For the mid- 

•rank method we have 

A 

1 

2 5 

— 14 

ILdi 

= 119.5, N = 14, 

B 

4 

10 

i=i 


C 

D 

4 

4 

2.5 

5 

Pm = 1 - 

6(H9.5) n _, 

14(196 - 1) °' 7374, 

E 

4 

7 

Referring to 

i the table we find that 

F 

4 

2.5 



G 

7 

8 

Kt 

-—- 

H 

8 

2.5 

Ax, 6 NK, 

I 

9.5 

6 

2 

0.5 0.0011 

J 

9.5 

12 

3 

2.0 0.0044 

K 

11 

11 

4 

5.0 0.0110 

L 

13 

13 

5 

10 0 0.0220 

M 

13 

9 



N 

13 

14 

Total 

17.5 0.0385 


We know that p - 1 - \ 7 ’^ = 0.6989 and in terms of S NK 

14(196 — 1) 

p = 0 7374 - 0 0385 = 0.6989 
The value given by DuBois for his method is 0.7511. 


Conclusion. A method has been developed for the treatment of rank correla¬ 
tion wheie theie are groups of equal variates. The method consists of applying 
a generally small correction to the value as ordinarily calculated by the mid¬ 
rank method in order to find the value which would be obtained by averaging 
the values of the rank correlation coefficient for all possible ways of arbitrarily 
assigning ranks to the equal variates. Thanks are due Professor P. S. Dwyer, 
without whose aid and encouragement this paper would not have been written. 
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NOTE ON THEORETICAL AND OBSERVED DISTRIBUTIONS OF 
REPETITIVE OCCURRENCES 

By P. S. Olmstead 

1. A simple problem of repetitive occurrences. Two questions which the 
engineer often desires to answer whenever he has a new type of apparatus or a 
new design of an old type of apparatus are; How many times will it perform 
its intended function without failure? and How many times will it fail to perform 
its intended function in a given length of time? To do this, he selects a number 
of what he believes to be identical units of the apparatus and gives.each unit a 
performance test under a uniform test procedure. The number of satisfactory 
operations prior to the first observed failure to perform this operation is called 
a "run” and is a measure of the type desired for each unit. 

If it is assumed that the probability of failure at any operation is a constant, q, 
and the probability of satisfactory operation is 1 — q or p, then the mathe¬ 
matical probability of runs of 0, 1, 2, 3 • • satisfactory operations for any 
unit are 


( 1 ) 


Q, Pi, p\ P S Q, 


respectively 

Let x denote the number of satisfactory operations in any-run. 
value of x, say m x , is given by 


The mean 


( 2 ) 


„ _ P 

Tfl/ * — • 


The variance of * is 


(3) 


P 


S 5 ' 


The first step in practice is to determine whether there exists a constant 
probability, p, by means of the application of the operation of statistical con¬ 
trol . 1 Expressions (1), (2), and (3) provide the necessary information for doing 
this. When a constant probability exists as evidenced by at least 25 consecu¬ 
tive samples of 4 units each the following practical procedure has been found 
to be satisfactory. 

1. An estimate of p (or 5 ), the sole parameter of the distribution, can be 
obtained from the average length of run in the sample. If p is less than 0.6 
and if the sample size is large, a reasonably good estimate of p can be obtained 
from the proportion of the sample having runs of zero length. 

2. The probability of getting runs of length x or more is p x . Thus, if a 
minimum (or maximum) value of the probability, p x , is chosen, a maximum 


1 W. A. Shewhart, “Statistical Method from the Viewpoint of Quality Control," The De¬ 
partment of Agriculture Graduate School, Washington, 1939, Chapter I. 
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(or minimum) expected length of run can be computed for use as a criterion 
for looking for assignable causes of vanation in the length of individual runs 
by using the estimated value of p 

3. The average and standard deviation to be used m calculating the limits 
to be applied to successive samples of rational sub-groups in accordance with 
the Shewhart 2 Criterion I are given by Equations (2) and (3) in which the 
estimates of p and q are substituted 

2. Application to a signal transmission problem. The theoretical solution 
given above is a direct answer to the first question at the head of this note. 

TABLE I 


Observed distributions of runs of x occurrences of event E for various test periods of 



The second question is also of interest particularly when failure to perform an 
operation does not impair the apparatus unit for performance of additional 
operations. In cases of this type, the engineer often lets his test continue for 
test periods of particular lengths, measured in numbers of operations or some¬ 
times in intervals of time (i.e., time intervals are often considered to be propor¬ 
tional to numbers of operations) and observes the number of failures during the 
test period for each unit. Thus, he may, after he has assured himself that 
control exists, arrange his data for each test period to show the frequency of 
occurrence of 0, 1, 2, 3, ■ • • failures per unit. 

Data of this type which are typical of those found in other studies made 


’ Loc. cit. 
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during the past two years are presented in Table I These were obtained in a 
signal transmission study in which the data for successive periods were obtained 

TABLE II 


Comparison of observed and theoretical values of averages and variances for 

distributions of Table I 


Statistic or 


Test Period 

Parameter 


1 

2 

3 

4 

5 

6 

7 

6 

11 

15 

n 0 

5 “n 

observed 

916 

853 

786 

719 

679 

646 

632 

.617 

.532 

.491 

$ 

observed 

.098 

.171 

269 

381 

.448 

543 

537 

.633 

917 

1 026 

P 

m* = - 

theoretical* 

091 

172 

.272 

.390 

471 

548 

.583 

620 

.881 

1.039 

or* 

observed 

091 

.200 

,343 

497 

556 

832 

760 

1.075 

1 783 

1 921 

* <2 Z 

theoretical* 

098 

202 

346 

642 

693 

848 

924 

1 005 

1.658 

2 117 


* Based on assumption that q 18 the true value of q 


TABLE III 

Theoretical distributions corresponding to distributions of Table I calculated by 
using q ~ — as the true value of q 


No of 
Occurrences 
per Period 

Freq. 

Test Period 

1 

2 

3 

4 

5 

6 

7 

8 

11 

15 

X 

0 

n 0 * 

878 0 

1519 0 

961 0 

723.0 

541 0 

407.0 

343.0 

266 0 


77 0 

i 

n i 

73 3 

233,5 

205 3 

202,8 

173.3 

144.1 

126 4 

101 9 

74 9 

39.2 

2 

nt 

6 1 

32 9 

43 8 

56 9 

55 5 

51 0 

46.6 

39 0 

35,1 

20 0 

3 

n a 

5 

4 8 

9 4 

16 0 

17.8 

18 0 

17.1 

14.9 

16.5 

10.2 

4 

n t 

.1 

7 


4 5 

5 7 

6 4 

6 3 

5 7 

7 7 

5 2 

5 

Ub 


1 

,4 

1 3 

1 8 

2 3 

2 3 

2 2 

3 6 

2 6 

6 

no 



1 

.4 

6 

8 

.9 

8 

1 7 

1 4 

7 

ni 




.1 

2 

.3 

,3 

3 

.8 

7 

8 

n 9 





1 

1 

1 

.1 

4 

3 

9 or ovei 









1 

.3 

.4 

Sample 

Size 

n* 

958 

1781 

1222 

1005 

796 

630 

543 

431 

301 

157 


* The observed values of no and n form the basis for the calculated distributions 


for separate units. Since each set of these data passed the scrutiny for control, 
there is justification for assuming that a statistical universe exists and that its 
functional form may be derived from the observed distribution. It was found 
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that these data were consistent, with the assumption that, where the probability 
of non-occurrence of a failure on a unit in the test period was q, the probability 
of exactly % failures on a unit was p x q. This set of mathematical probabilities 
is shown in (1) with q redefined to apply m this case to non-occurrence of a 
failure 

Observed and “Theoretical'’ values of the averages and variances for the 
observed distributions are shown in Table II The. basis for calculating the 
theoretical values was to take the ratio (designated q) of rid to n for each distri¬ 
bution as the estimate of the true value, q. Distributions as shown in Table III 

TABLE IV 

Test of fit of theoretical to observed distributions (Table III and Table 7, respectively) 


Test Period 



1 

2 


D 

5 

6 

7 

8 

ll 

15 

X 5 * 

2.24 

0.20 

0.32 

HII 

9.79 





3.98 

Degrees of 
Freedom 

1 

2 

2 

3 

3 

3 

3 

3 

4 

4 

P* 

.13 

.90 

.87 

.55 


.87 

.36 


■ 

.41 


* Minimum number in cell for theoretical distribution taken as 5. 


were calculated from each q. These distributions were tested against the ob 
served distributions by means of the x 2 test with the results shown in Table IV, 
which are all withm reasonable limits of what might be expected when a con¬ 
stant probability exists. 

3. Conclusions. When a constant probability applies to each operation in a 
repetitive process this note sho .ow to establish criteria for identifying signifi¬ 
cantly long or short lengths for individual runs and significantly high or low 
average lengths for groups of several runs. A problem taken from the field of 
signal transmission gives assurance of the existence of this type of distribution 
in. practice. 

Belt, Telbphone Laboratories, 

New York, N. Y 
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THE DISTRIBUTION THEORY OF RUNS 

By A. M. Mood 

1, Introduction.. In studying a particular sample, the order in which the 
elements of the sample were drawn is frequently available to the statistician. 
This important information is usually entirely neglected by him. Such dis¬ 
regard must be attributed, to a considerable extent, to the unsatisfactory state 
of mathematical devices for using the knowledge in question. One reasonable 
mathematical method for handling this information, the one to be used in this 
paper, is to make use of the distribution of runs. A run is defined as a succession 
of similar events preceeded and succeeded by different events, the number of 
elements in a run will be referred to as its length. 

The distribution theory of runs has had a stormy career. The theory seems 
to have been started toward the end of the nineteenth century rather than in the 
days of Laplace when there was so much interest in games of chance. In 1897 
Karl Pearson [1], in a discussion of data taken from the roulette tables at Monte 
Carlo, wrote "... the theory of runs is a very simple one." In this book he 
developed no theory but it is evident from his computations that he regarded the 
distribution of runs as a special case of the multinomial distribution. The 
multinomial method, besides evading the issue somewhat and raising questions 
of random sampling, also gives incorrect results when one is interested in runs 
of more than one kind of element. In 1899 Karl Marbe [2] derived an expression 
for the mean of the number of iterations of a given length from a binomial 
population. This result was incorrect because he neglected dependence between 
overlapping iterations. An iteration is defined as a sequence of similar events; a 
run of length t is counted as t — s + 1 iterations of length s for s < t. Marbe 
has assembled a great mass of data with the object of proving the popular 
hypothesis that a “head” becomes highly probable after a long succession of 
“tails” has appeared. Ordinary significance tests applied to his data do not 
support this contention, but Marbe continues to advocate it [3] and [5]. Of 
course, he has been severely criticised by many mathematical statisticians. 

In 1904 Griinbaum [6] derived the mean of the number of runs of given length 
from a binomial population by the multinomial method. The first correct 
formulae were derived in 1906 by Bruns [7] who found the mean and variance of 
the number of iterations of given length in samples from a binomial population. 
In a book published in 1917 von Bortkiewicz correctly derived for the first time 
the mean and variance of runs from a binomial population using a method similar 
to that of Bruns. This book [8] contains a great many formulae for means and 
variances of runs and iterations under various special circumstances; a large 
portion of it is devoted to an exhaustive criticism of Marbe’s work. In 1921 von 
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Mises [9] showed that the number of long runs of given length was approximately 
distributed according to the Poisson law for large samples. 

It was not .until 1925 (so far as the author has been able to ascertain) that an 
actual distribution function appeared when Ising [10] gave the number of ways of 
obtaining a given total number of runs (without regard to length) from arrange¬ 
ments of two kinds of elements. Stevens [12] in 1939 published the same dis¬ 
tribution and described a x 2 criterion for significance. Wald and Wolfowitz [13] 
in 1940 published the same distribution and showed that it was asymptotically 
normal. These papers are all concerned with random arrangements of a fixed 
number of elements of each of two kinds; the last mentioned paper describes a 
very interesting application of the distribution to the problem of testing the 
hypothesis that two samples have come from the same continuous distribution. 
Wishart and Hirshfeld [11] m 1936 derived the distribution of the total number of 
runs (again without regard to length) in samples from a binomial population and 
showed it was asymptotically normal. 

In this paper we shall derive distributions of runs of given length both from 
random arrangements of fixed numbers of elements of two or more kinds, and 
from binomial and multinomial populations. Also we shall give the limiting 
form of these distributions as the sample Bize increases. These limiting dis¬ 
tributions are all normal. The distribution problem is, of course, a combina¬ 
torial one, and the whole development depends on some identities in combinatory 
analysis,—some new and some well known to students of partition theory. 

The paper will be divided into two parts. The first will deal with distribu¬ 
tions obtained from random arrangements of a fixed number of each kind of 
element. The second will deal with distributions of elements from a binomial 
or multinomial population. 


Part I 

2. Distribution of runs of two kinds of elements. Consider random arrange¬ 
ments of n elements of two kinds, for example a’s and b ’s with «i + = n. 

Let ru denote the number of runs of a’s of length i, and let r 2 , denote the number 
of runs of b’s of length i. For example the arrangement 

abb ah a a abb a a a 

will be characterized by the numbers ru = 2, m = 2, — 1, = 2, and all 

other ru — 0. Also we let r : = £ ru and r s = 2Z denote the total number of 

< • 

runs of a’s and b’s respectively. Throughout the paper a binomial coefficient 
will be denoted by 

(211 ( m \ = 

\fc/ k\(m — k)l 

and this is defined to be zero when m < k. A multinomial coefficient will often 
be denoted by 
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m! 


(22) m = . _ 

v [mj milm-d ... m, 1 

(2.3) 2m, = m, m, > 0 

and when such a coefficient is to be summed over the indices m, the two condi¬ 
tions (2.3) are always understood and will not be repeated; other conditions on 
the indices will be placed below the summation sign. 

Given a set of numbers r„ (t = 1, 2; j = 1, 2, ..., n.) such that X) jr X] = n,, 

there are ri and different arrangements of the runs of o’s and b’a respec- 

UuA L r *»J 

tively. Hence the total number of ways of obtaining the set r,, is 

(2.4) Ar(r *' ) ” [n.][^] P(ri ’ r=> 

where F(ry , r 2 ) is the number of ways of arranging r 2 objects of one kind and r 2 
objects of another so that no two adjacent objects are of the same kind. Thus 

F(n , r 2 ) = 0 if | ri - r s | > 1, 

(2.5) =1 if | ri — r 2 1 = 1, 

= 2 if Ty — r 2 

Since there are possible arrangements of the o’s and b’s, we have at once the 
distribution of the r {j 


( 2 . 6 ) 


Pin,) = 


”ri"| "r 2 " 

_ - r nJ \jv_ 


Fin, n) 


(:) 


Certain marginal distributions will also be of interest. To obtain, for example, 
the distribution of the r u , it is first necessary to sum over all partitions of 
rii . This is easily accomplished by finding the coefficient of x ni in 

(x + x 2 + x 3 + • • -) r> = z rs (l + x + x 1 + • ■ -) ri = ^ _ x y t 

T^o V r 2 - 1 / 

The term corresponding to t = ih — r 2 gives the desired result: 
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We have then 

( 2 . 8 ) 


P(r v , rj) = 


© 

and summing this over r 2 , a slight simplification gives 

_ Met o 


(2.9) 


P(r i} ) = 


(:) ’ 

The distribution (2.6) summed over n, and r 2 ,' gives by means of (2.7) 

fclQfclxV.’J 


( 2 . 10 ) 


P(n,r 2 ) = 


(:) 


which is essentially the distribution derived by Wald and Wolfowitz [13], and 
summing this over r 2 we get the distribution discussed by Stevens [12] 


( 2 . 11 ) 


fW = 


C;i iX-r) 


(:) 


Another marginal distribution which will be useful is obtained by summing 
(2.9) over r» for i > k. If we let 

, j < k, 

Til JO— 1 

^ fij f A. = jVij * } 

k 1 

we must then sum the multinomial coefficient 

Slfel 

ri*I • • • n ni I 

over all partitions of n 2 — A such that every part is greater than k — 1. This 
is given by the coefficient of x ni ~ A in 

Or* + * L+1 + • • O' 1 * = £ ( Su “ 1 + *) x 1 

thus we have 

(2.121 E<*)_ Sul _ /ni — A — (k — l)su - l\ 

ri»! • ■ • ri nl \ si* — 1 / 
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where E<*> denotes summation over all positive integers r u , n r, B 

n l ' ™ 1 ‘ 1 

such that Ejriy = Ri — A. This identity with (2.9) gives 


(2.13) P(sn) 


si fti 2 + l\ !n\ — A — (fc — l)sifc — l\ 

„ L s iJ \ si A _sub -1 ) 


i — 1,2, ■ • ■, h. 


Another useful distribution analogous to (2.13) is derived by considering runs 
of both kinds of elements. If we define s 2 , (j = 1, 2, ■ ■ • , h) and B in terms of 
r 2 , just as s lt and A were defined above, it follows at once from (2.6) and (2.12) 
that 

(2.14) P(s u , s 2l ) = 

'tii - A - (k — l)si* - l\ /tii - B - (h - 1 )s 2 h - 1 

v ~ 1 _ / \ _ S2h ~ 1 _ 

(:) 

» = 1 , 2 , ••• , k;j = 1 , 2 , ... ,* 

These last two distributions should be the most useful for applications. The 
long runs have been added together to form the new variables s u and s 2h thus 
decreasing materially the number of variables as compared with (2.6) and (2.9) 
while at the same time little information is lost. One is free to choose k and h 
so that the number of variables is appropriate for the data at hand. Moreover, 
it is shown in Section 5 that these variables are asymptotically normally distrib¬ 
uted so that one may apply a simple x 2 test of significance for “randomness of 
elements with respect to order” when dealing with large samples. We shall 
then be able to test whether a sample has been “randomly” drawn in a certain 
sense. 




3. Moments for runs of two kinds of elements. Instead of dealing with the 
ordinary moments we shall obtam formulae for the factorial moments because 
the expressions are much more compact. As is customary, a factorial will be 
denoted by 

(3.1) z (a) = *(z - l)(z - 2) ... (* - a + 1), 

and £ (0) is defined to be 1. Of course the ordinary moments are determined by 
the factorial moments by means of relations of the type 

= E cu w . 

A recent discussion of the coefficients G\ has been given by Joseph [14]. The 
mathematical expectation of a function /(r) will be denoted by 
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(3.2) 


E(f(r)) = L/(r)P(r). 


rP“‘> 


Of course E is a linear operator. We shall require the following identity 

/ ri X - E m*( — l\ 

^ ri - 2 a * “ 1 / 

where 2 ( u denotes summation over all positive integers r u , r u , ... , n ni such 
that iru = Mi. (3.3) may be verified by differentiating 


(3.3) E»n *>[;;]- 


<p(li ) — (llX ■+" + • • • ) ri 

a, times with respect to i, (t = 1, 2, • • • , ft;), then finding the coefficient of 
x ni after putting U = 1. The identity (3.3) enables us to find the factorial 
moments of the variables in the distribution (2.9) for we have 


S (n*') = En r K-[;;](”vtV(:) 

= E 


Ml — E ia < — l\ /n a + I s 

ri 


f n \ 

W 


(3-4) 


ri - E “ 1 / 

= £(»,+ i)*o " 2 *■. - A- E «. + 1 
\n-E a <“ 1 /\ n - E«. 
m-E (* + l)a» 



m' 

Mi/ 


= («, + l) (Io<) 


Ml - E MU 



n 

niy 


The sum on rj involved in the last step is given by the identity 

which is readily obtained by equating coefficients of x c in 

a+*(» + !Y-fi±£r. 

\ x/ X B 

We shall give here the means, variances and covariances obtained from (3.4) 

(3.6) E(n,) = fa + l) t * > ni°/n ( *’ +1, ) 

(3.7) 


„ = n,® (n, + l) ( V +y) nftn, + 1) , »^»P ) 

” n (>+)+2) B (tH) ft (l+l) 1 


(3.8) 


an = 


.MfWlW 1 , (m 2 + 1) l2) Ml* 


n (2.+2) 


I (m 2 + 1) l2) Ml’ 1 /. _ (n2 + 1) <S> Mi^X 
" r n (,+1) \ m« +1> / 
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These will be needed in the section dealing with asymptotic distributions. The 
moments for the distribution (2.6) follow at once from (3.3) as 


(3.9) 


E (II rfrMf' 1 ) = E rP-'M 14 '* 

U r l ,r 2 

ini - E ia > ~ A 

Vi-E^” 1 / 


in 2 - E jbj ~ 1\ 

\ ^ - E - 1 / 


F(ri, r 2 ) 



The summation on r 2 is accomplished by putting r 2 = ri — 1, n, and n + 1, 
but after that has been done it is necessary to expand the product of the two 
factorial factors in factorial powers of the lower index of one of the binomial 
coefficients. This is easily done for the first few moments, but there appears 
to be no simple expression for the general case. The means, variances and 
covariances of r u are given by (3.6), (3.7) and (3.8) and those of r 2l are obtained 
from these equations by interchanging n\ and ih . The other covariances are 


<T r ll r, l 

(3.10) 


„«+2)„0+S) ^O+D^G+l) 

fli W 2 . . Tli Tli 

+ 4 


n ( *+3+2) 


n (!+!+l) 


+ 2 


nl'WS 


(ni + 1) (2) (ni + 

^(•+0 7^0+1) 


A slight variation of the method above will give the moments of the si, in 
the distribution (2.13). An accent on a s umm ation sign will indicate that the 
term corresponding to i = k is to be omitted. Differentiating 

= [kx + kx + • • ■ + + ts,(x k + x h+1 + • ■ • )] n 


a ( times with respect to k and finding the coefficient of a:” 1 after putting U = 1, 
we obtain 


(3.11) 


tt (o,) I” Si"l fn-i — A ~~ (fc l)sifc l\ 


= 8 P“> 


in i — E «*• + fl * “ ^ 

V Si-E'a.-! ) 


This with (2.13) gives by the same steps as used in obtaining (3.4) 


(3.12) E (ft = («2 + l)‘ Io<) ^ 


'n — E "" E ' a ' 
ni — E 





nil 


The first two moments are 
(3.13) EM 


(n 2 + l)ni* } 
n»> ’ 


(3.14) 


nlim + l)wi ,+fc) n*(rh + l) 2 nf nf> 

7j,(t+*+l) 710+1) i) j(M 1 
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„ fa + l) (2) fa 2t) fa + l)fa fc) / _ fa + l)?i 


/■, _ fa + l)fa W \ 
V n<« /• 


(n 1 e\ V'“3 I -*■/ '“1 , 

(3.15) (T** = - m -1- -^ _ _ 

n (2i) n tt) \ n (« 

The others are, of course, given by (3.6), (3.7) and (3.8). 

The joint moments of the variables in (2.14) as obtained from (3.11) are 

e (n «&•> - l s p ai) ( ni _ £ ta * + a * - *\ 


(3.16) 


* 11, ‘ \ «i - ax - 1 

/fa - X)#, + &* - 1 

\ s * - L' -1 

In addition to the covariances (3.10) we shall need 


f(si, s 2 ) 


fa/ 


(3-17) 


- ni k+i) ni’' + " + 2nj t+1) fa ,+1> , _ fa i+1) fa’> + fa»fa 

,U ‘ i; Tj(k+J+1) ' n (t+)) ' 


fa + l) (a fa + 1) (2) fa w n 2 J) 


»CM n (,+1 > * 

a 181 „ = ^y +1) < hl) _u o _ fa + Dfa + lfaM 

n (*+M n (*+A-i> n (k) n (l,) 


The moments of r in the distribution (2.11) may be derived easily by means 
of (3.6) as ‘ 

(3.19) E(r[ a) ) = fa + l) (a) ^” ®)/(”)■ 

From which 


(3.20) 

(3.21) 


Ufa) = fa±ifa ; 

n 

_ fa + 1) ( V 2> 

^ ««C2) 


4. Distribution and moments of runs of k kinds of elements. This section 
is a generalization of the proceeding two sections to several kinds of elements. 
The case k = 2 was treated separately because the special character of the 
function Ufa, r 2 ) in this instance made the distribution comparatively simple. 
Now we shall be interested in k kinds of elements denoted by fli, > ■ ■ , a* and 
we shall suppose there are n, elements of the tth kind. We let r>, denote the 
number of runs of elements of the xth kind of length j, and put 

• * ni 

n = Zj n { , r { = X U ,. 

1 

The same argument as was used in deriving (2.6) gives 
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where the function F(n , r %, ■ , rk), which will be referred to hereafter simply 

as F(r t ), represents the number of different arrangements of ri objects of one, 
kind, ri objects of a second kind, and so forth, such that no two adjacent objects 
are of the same kind. We shall be able to give the explicit expression for F(r,) 
after examining the marginal distribution P(r,). This is obtained by summing 
(4.1) over r, with r<, fixed by means of the identity (2 7) giving 


(4.2) 


PM = 


[;] 


PM 


Despite our present meager knowledge of F(r.) it is possible to find the 
moments of the u as distributed by (4.2) Since X! P(r«) = 1, we have the 

r i 

identity 
(«) ' 

From this the moments are easily derived. If we put 


(4.4) 
we have 


u, = n, — r, 


r n n (”' z}) m - s n <«. - n ("• i J) w 

-Ein».-i> w n O ' 60 

- n <*-2 n 


=n («. -1)'"’ [* “ ■ 


k 

n 

t-i 


The summation invol 
last equation by 


M 


ved in the last step is given by (4.3). On dividing the 
we get the factorial moments of the u, 


(4.5) j (n»:-) = n (». -[“.■?*]/£]■ 

From these equations the moments of the r, may be found; the means, variances 
and covariances are 
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(4.6) 

(4.7) 

(4.8) 

It is clear that 


E(j\) = 


n,(n — n, + 1) 


n 


_ «®»® 


nj 2) (n — Ui + 1) 


era = 


( 2 ) 


nn' 


( 2 ) 


(4.9) 


ip(U) = Coefficient of II x? { in 

1 


(xi + • • ■ + Xk) h n (*! + ••• + 21 1-1 + U x i + X.+l + • • • + Xk) n< 1 j ^ J 

is a generating function for the moments of the variables u,. This generating 
function will enable us to find the exact expression for F(r { ) for we have 


P(iu = n«) = Coefficient of H t"" in <p(U) 

1 

jl HsM/t:]- 


ntj-nf-aj 


Also 


-n ft. 0™/[»".] 

and equating the expressions on the right of the last two equations we have 


(4.10) 


(4.11) 


F(r<) 


n 

& [*.] 5 [ r, »T, *] 


in which the prime on the n[, indicates that the indices corresponding to j = i 
are to be omitted; hence i takes all the values 1, 2, • • ■ , fc and j takes all values 
1, 2, ■ • ■ , k except i because the index n„ has been cancelled with n, — r, in 
the binomial coefficient in the denominator of (4.10). It is clear from (4.11) 
that F(ri) may be expressed as follows 


, , F(u) = QT II *7 r< (®i + 

(4.12) i 


+ Xk) k (& + ■ + X*) ri 1 

(*i + x » + • • • + %k) Ti 1 • • • (xi + • • • + i) r * 

in which “CT” is an abbreviation for “constant temi of." 
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We are now in a position to obtain moments of the variables r l3 in the distribu¬ 
tion (4,1) by means of identities similiar to (4,3). As an illustration we compute - 

k 


2 (:;::: i) n ft : l) m - 2 ft:;:;) n ft: l) 


.crn*rn (*!+• 

1 1 

= C2’f[0(^ + 

1 

_ fnl (n - ni) (a> 


“1 

J1.-0 


+ t,X t + ’ • • + XkY' 1 

■ + Xk) n a {%i + • ■ • + a:*) 0 


n« 


or 


(4.13) z ft: “: J) n ft:j) no - . 

n n »! 

■ 1 

The moments of r,, may be computed from identities of this type together with 
(3.3). The first two moments are 

(4.14) E(r„) - (ft - n, + l)<V7n ( ' +,) 

(4.15) E{r\f - n®>(n - n,) <2) (n - n { + l) (!, /n ttj+2) 

(4.16) E(r v r lt ) = »« +,) (n - n,)® (n - + l)®/n w+,+2> j * t 


E(r t ,r. t ) = (n, - j - 1) (ft. - f — 1) 


TjCH-H-2) 


{(n. —j + 1)® (n, — f +1) 


\(S) 


+ 2 (n — n,- — ra.) (n, — j + 1) (ra. — t + 1) (ra, — t + n, — j) 

+ (n — n, — n,)®[(«, — t + 1)® + 2 (ra< — ] + l)(n, — f + 1) 

+ (n< — j + 1)®] + 2(n — n, — n,) <s) (n< — j + n, — t + 2) 

+ (n — ra, - ra,) c<) J + 2(ra, - j - 1) n ^(,+ 7 ^ {(ra< - i + 1) 

.(n. - 4 + 1)® + (» - rn - n,)[ 2(ra, - j + l)(n« - t + 1) 

+ (ra, — 4 + 1)®] + (ra — ra, — n,)®[2(ra, -r 4 + 1) + (n, — j + 1)] 

+ (n — rii — n,) l3> } + 2 (ra, — 4 — 1) ' n()+i+1) {(ra, — 4 + 1) 

. (ra,- — j + 1)® + («■ — 7i< — ra,)[2(ra, — j + l)(ra, — 4 + 1) 

+ (ra, - j + 1)®] + (ra - ra, - ra.)®[2(ra, — j + 1) + (». - 4 + 1)1 

+ (» - ». - n.) (,, | + 4 _ j + 1)(n * - 1 + 

+ (n — n, — ra,)(«( — j + ft* — 4 + 2) + (ft — »,■ — ra,) ®}. 
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Such a lengthy expression as this last one can hardly be useful to the statistician 
and for this reason we shall not define variables s„ analogous to the s lt and s 2 
of Section 2 and take the time and space to find their moments. 


6. Asymptotic distributions. We shall show that some of the distributions 
obtained previously are asymptotically normal when the n, become large in 
such a way that the ratios njn remain fixed. The description “asymptotically 
normal” means that the distribution approaches the normal distribution uni¬ 
formly over any finite region as n, —> co. The ratios n,/n will be denoted by 
ei , hence 2 e, = 1. The symbol 0{\ /n) will represent any function such that 

Lim n a O (—'^ = L < °o. 

7i—*0O \7l a / 

We shall not, of course, be able to get any limit theorems for distributions 
like (2.6) or (2.9) because the number of mdependent variables increases with 
n. We shall consider first the distribution (2.13) whose asymptotic character 
is given in the following theorem. 

Theorem 1. The variables 


(5.1) 


Xi 


_ s_u 


— neI el 
y/n 


X k 


Sik — ne\ei 


y/H 


i < k 


are asymptotically normally distributed, with zero means and variances and co- 
variances 


v., = e\ +} ~ l el[(i + l)(j + l)eiS 2 - ije 2 — 2eJ, i, j < k, i ^ j 

Vu = el' 'el[(i + l) 2 eiC 2 — ici — 2ei] + e\e\, i < k 

(5.2) 

v.* = e'i 'e\[(i + l)/ccie 2 — ike 2 — ej, i < lc 

oki = C\ e^}z{e\ — l)&i — ei] + elei . 

The limiting means, variances and covariances are obtained from the relations 
(3.6), (3.7), (3.8), (3.13), (3.14) and (3.15). 

To demonstrate this theorem we make the substitutions 


(5 3) 


ni = ne, 

Su = ne lei + y/nx, 

Sik = ne Ify + y/n x k 

k 

Si = ne ie 2 + y/n 2 x * 


i = 1, 2 

f = 1, 2, • ■ • , k 1 


A = n(ei — e\ — ke^e^) + y/ii 2 ix, 

i 

in (2.13), and estimate the factorials by means of Stirling’s formula 
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+ 0 (i))- 

The result is an unwieldy expression which we shall not present at the moment. 
First we note that the exponential factors cancel out because the sum of the 
lower indices of a binomial or multinomial coefficient is equal to the upper index. 
Also we simplify the expression by considering in detail only terms which involve 
the x , ; the normalizing constant can be determined from the final li mi t function. 
Any function of the parameters will be represented by the letter K. Thus in 
(5 4) we need consider only the factor rn m+i . All factorials will be of the form 

(5.5) m = na + y/nL(x) + & 



where L(x) is a linear function of the a,, and a and h are independent of n and 
x t . Now 


m 


TO+J _ 


= (na + y/nL(x) + b)"“ + ' /SL(l!+b ’ 4 

= (no) na ' K/ * L<!t)+Hi (l + 

\ a y/n anj 

= K(n a y* Lix) (i + + 4 s )’ 


na+\/fi !>(*)+&+$ 


Lix) 

o y/n an) 


and log m m+1 = K + -\/nL(x) log na + {na + \/nL{x) + & + I) 

, og ( 1 + iw+i) 

\ a y/n an) 

= K + s/nL(x) log no + (na + y/nL(x) + b + f) 

(5-6) , ,, . .. 

m + o l\\ 

• \o y/n "on a?n \n 3,2 )J 

= K + \/nL(x)(l + log no) + — L\x) + 0 


so terms arising from b (and b + j m the exponent) will be neglected as they 
give rise only to terms independent of the x x or of order 1/n. Of course log 
(1 + 0(l/m)) = 0(l/m). Thus, keeping significant terms only, the result of the 
substitutions (5.3) and'(5 4) in (2.13) after taking logarithms and usmg (5.6) is 

k—l ^ 

-log P(u) = K + y/n Z ( io g nei4 + 1) + I- 
— y/n (jE, (log nel + 1 ) + ^5 (E x ^j 
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(5.7) + Vn w. + Qc ~ 1 )xkj (log ne\ + 1) - ^ ix, + (& - l)x fc Y 

+ 2 s/nxh (log ne{e 2 + 1) + -P- — s/n (22 (log ne? +1 + 1) 

eie 2 \ i / 

+ ip (?’**) +0 (t;)- 

The coefficients of Xi(i < Jc) and a:* are 

Vw(l°g ne i e l + 1 — log ne\ — 1 + i log ne\ + i — i log ne\ +l — i) = 0, 
Vn( — log nel - 1 + k log nel + k + 2 log ne\e 2 + 2 — k log nei +1 — k) - 0, 
Hence only the quadratic terms remain and we have 

(5.8) 
where 


(5.9) 


-log P = K + i Z 

S'xix, + O(^) 

_ 1 , ije 2 

1 T k +1 

h 3 < *, 

e 2 e 2 


1 4 . 1 i 
el + e[el * e\ +1 

i < fc, 

_ 1 , i + t(k — l)e, 

A ‘ 1+1 

i < k, 


e a 


ei 


** - 1 -l 2 a. ** ( fc - 0 * 


It is merely a matter of straightforward multiplication of the two matrices to 
verify that || a' 1 1 | is the inverse of j| <r xi ]], hence is a positive definite matrix. 
The details of the verification will be omitted, We have then 


(5.10) 


P = Ke-* Xr ' ittZi (l + 0 (-L)). 


In this equation K must necessarily contain the factor 


(ft 


because there 


are k + 5 factorials in the denominator and 5 in the v numerator of (2.13). 
Since Ar< = 1, this factor, in view of (5.1), may be replaced by IIAa;<, so 

(5.11) P = Ke~ n ’ Ux '*t IIAs, ^1+0 ■ 

v 

If we restrict the x t to any finite region R in the ir-space, the function 0(l/n 4 ) 
approaches zero uniformly as n —» ». Thus, if Ai < Bi are any positive 
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numbers such that the corresponding values of Xi , say a, and &,•, obtained by 
substituting A, and B, for r, in (5.1), determine a rectangular region ft'(a, < x, < 
bi), which lies in R we have 

L PW = £ Ke^’^-nAx/l' + of-^^ 

*,-a, \ \V n // 

(5.12) 

Jr 1 


by the definition of a definite integral and Riemann’s fundamental theorem. 

We have given some details of this proof in order that it may serve as a model 
for other theorems of a similiar nature which will appear later, and for which 
a complete proof will not be given. Two immediate consequences of Theorem 1 
will now be stated as corollaries. 

Corollary 1. The variable 

r — neiSj 

X — -—- r— 

\/neie2 

where r is the total number of runs of one kind of element, is asymptotically normally 
distributed with zero mean and unit variance. The limiting mean and variance 
were computed from (3.20) and (3 21). 

Corollary 2. The variable Q - is asymptotically distributed accord¬ 

ing to the x~law with k degrees of freedom. 

In exactly the same manner in which Theorem 1 was deduced from (2.13), 
we may prove the following theorem corresponding to the distribution (2.14). 

Theorem 2. The variables 


x t = 


su — we [el 


■\/n 


i < k, 


(5.13) 


su — ne\ei 

Xk = - 7 =—j 

yn 


Vi = 


2 < 

Sit — ne\Ci 


Vn 


< K 


are asymptotically normally distributed with] zero means and variances and 
covariances 

i, j < k, 
i < k, 
i < k, 




= 4 +i ~y 2 [(i + l)(j + - m - 2ei] 

tr X(X . - eV^el[(i + l) 2 eie 2 - i\ - 2ei] + e[el 
a x{ x t = ei + * _1 e 2 [(* + tykeito - ike% - ei] 

<r**x* = fc a e l - ei] + e*«a i 
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h j < h, 
i < h, 


(5,14) a ViV , = el +3 + 1)0 + l)fiifi2 — V^i — 2ei\ 
a ViV{ — e V + l)^i e 2 — iei — 2es] + e\e\ 
a x ,vj — ei +1 e2 +1 [0 + 1)0 + l)cifi2 — 2ie2 — 2jei + 4eie 2 -f- 2] 

i <kj <h, 

<Tx t vi ~ £ +1 el[k(j + l)cifij ~ 2(k - l)e 2 - O’ “ l)ei + 2eic 2 ] j < h. 


These limiting variances were computed from the variances and covariances 
given in Section 3. We have chosen the variable Sis of (2.14) as the dependent 
variable. The proof of this theorem is omitted. From it the following corol¬ 
laries are deduced immediately. 

Corollary 3. If u, = x, and u k+l = y, of (5.13) and || a ' 1 1| (i ) j = i ( 2, 

• ■ , k + h — 1) denotes the inverse of (5 14), then the variable Q = 2a*0,w, is 
asymptotically distributed according to the x-law with k + h — 1 degrees of freedom. 

Corollary 4. If s< = s u -j- s 2l - denotes the total number of runs of both kinds of 
elements of length i, and s* the total number of runs of length greater than k — 1, 
then the variables 


(5.15) 


x t = 


Xh 


s % — € 261 ) 

y/n 

Sk — n(e\ei -f- e\el) 


V 


n 


i < k 


are asymptotically normally distributed with zero means and variances and 
covariances 


(5.16) <T,, — <Tx,Xj ri" O’i(i), Xx/yi Cy lVj . 

We have put h = k in Theorem 2 to obtain this result. The terms on the right 
of (5 16) are defined by (5.14); terms which do not appear there may be found 
by interchanging e x and e 2 in one of the relations, For example <r VkVk is given by 
interchanging e t and e t in the fourth equation of the set (5.14). 

Corollary 5 The variable Q = Jhr'’x,x, where the x< are defined by (5.15) 
and || a' 1 1| is the inverse of (5.16), is asymptotically distributed according to the 
X-law with k degrees of freedom. 

Corollary 6, If s denotes the total number of runs of both kinds of elements , 
then the variable 


_ s — 2neiea 

2-\/neiei 

is asymptotically normally distributed with zero mean and unit variance. This is 
the result denved by Wald and Wolfowitz [13]. 


6. Asymptotic distributions for k kinds of elements. We now investigate the 
asymptotic character of the distribution (4.2) 
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( 6 . 1 ) 



where r< is the total number of runs of the ith kind of element. 
Theorem 1. If k > 2, the variables 


( 6 . 2 ) 


x, = 


r, - ne,(l - e t ) 
Vn 


are asymptotically normally distributed with zero means and variances and 
covariances 

(6.3) <Uj — ele* , an = e 2 (l — e<) 2 . 


The restriction k > 2 is made because m the case k = 2 the correlation between 
the two variables approaches one, and the numbers <r„ are all equal. The result 
may be called a degenerate normal distribution and might be included in the 
theorem in this sense; we have chosen to omit it because this case is better taken 
care of by Corollary 1 of the previous section. 

The proof of this theorem will be simplified if in the moments (4.5) we replace 
the numbers — 1 by n,. This substitution will not, of course, affect the 
limiting moments. Hence we consider the variables v, with moments given by 



are asymptotically normally distributed with zero means and variances and co- 
variance (6.3). It is possible to prove this statement by showing that the 
characteristic function (Fourier transform) obtained by substituting id, for fi 
in the moment generating function 


*„(«.-) = Coef. of II x”‘ in 


( 6 . 6 ) 


II (*!+.■• + £i-l + Ux, + z>+l + • • ■ + X h y < /[;.] 


approaches 
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as n —> oo. This method is not appropriate for proving a similiar theorem 
which appears in Part II, and we prefer to give here a demonstration that will 
suffice for both theorems. 

In order to prove our theorem we consider the general term in the coefficient 
of Der‘ in (6.6) 

in which 

k i 

(6.8) E m,, = -a, 


must be required as well as the usual restriction on indices of a multinomial 

k 

coefficient, £ m u — n ‘ • Therefore only (k — l) 2 of the indices are independent. 
j-i 

Clearly mu = v ( , Now without concerning ourselves about the statistical 
significance of the variables , let us consider their distribution 

(6.9) n [;,]/[;,] 

in which the variables corresponding to the values i, j — 1, 2, • ■ • , k — 1 will 
be chosen as the independent ones. We shall now prove a theorem from 
which Theorem 1 follows immediately. 

Theorem 2. The variables 


( 6 . 10 ) 


Xij = 


mu — ree<e,' 

Vn 


i,j - 1, 2, • • •, Jfc - 1 


are asymptotically normally distributed with tew means and variances and co- 
variances given by 


( 6 . 11 ) 


WjtPQ ~ ) 

= ^*(1 6t)&jQp f 


= e,e,(l - e<)(l - e f ). 


First it is to be noted that the moments of the m,j are easily obtained from the 
identity 


e nM = [y 


sn^nhl-sn^'-’n 

0 i L m 'iJ i l 



* 



( 6 . 12 ) 

as follows 
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and on dividing this last relation by n 

\_Yl\ 

(6.13) i?(II riv 0 ) = n a>i) n «P* 0 ‘ ,) /n (S ' ,0 ' ,) 

'■i • i 

from which the moments (6.11) and the means in (6.10) were computed. 

The proof of the theorem is similar to that of Theorem 1 in Section 5. We 
make the substitutions 


we obtain 


n< = ne ,, m k , = n* - 2 


m,,, 


i-i 

k-l k-\ 

m, k = — S m„, m kk = 2 n k + £ — n, 

j-i i.j-i 

m,, = ne t ej + \fnx „, 

in (6.9) and employ Stirling’s formula exactly as before. The details are too 
similiar to warrant repetition. The final result is 

k 

(6.14) D(m.,) = Re-' 1 '"'" 1 ' 1 *” n dx ti (l + 0 . 

# Where || a i,pq || is the inverse of (6.11) and is defined by 


‘ 1.19 __ 1 
- 1> 
e k 


9 U ' U 




i_ 

eie k 



«J,W 


"b 2 • 
eie*: e k 


Theorem 1 is a corollary of Theorem 2. Also we may state these additional 
results: 

Coeollaby 1. If k (> 3) kinds of elements are arranged at random and r 
denotes the total number of runs of all kinds of elements, then the variable 


r-n( 1 - 2e<) 

\/ 71 


is asymptotically normally distributed with zero mean and variance 

a 2 = Xe 2 , - 224 + (2e?) S 

where e< is the proportion of elements of the i-th kind. 

Corollary 2. The variable Q = 2<r' } x x z, where the x t are defined by (6.2) 
and || <r ,J || is the inverse of (6.3), is asymptotically distributed according to the 
X-law with k degrees of freedom. 

As was mentioned in Section 4, we could define variables St, (i = 1, 2, > ■ • , k 
and j = 1, 2, • • • , h f , the hi being a set of k arbitrary integers) with a distribu¬ 
tion similiar to (2.14). If one worked through the details he would find, no 
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doubt, that these variables arc asymptotically normal. The matrix of vari¬ 
ances and covariances is so complicated, however, that such a theorem would 
hardly be useful to the statistician, and the author does not feel that it would 
be worthwhile to go through the long and tedious details merely for the sake of 
completeness. 

Part II 

Instead of having the number of elements of each kind fixed, we now suppose 
that they are randomly drawn from a binomial or multinomial population The 
numbers n, thus become random variables subject only to the restriction that 
2ft, = ft, the sample number. The development will be entirely analogous to 
that of Part I, and the same notation will be used. The probability associated 
with the tth kind of element will be denoted by p,. 

7. Distributions and moments. The major part of the derivation of the 
various distribution functions has already been done in Sections 2 and 3. With 
the distributions of these sections we need only employ the fundamental 
relation 

(7.1) P(X, Y) = Pi(X ! Y)P,(Y) 

in order to obtain the distributions required here. X will represent the set of 
variables r<,' or r,, and Y the variables n<. For the binomial population 
Pz(Y) will be 

(7.2) PCnt.nO = 

Therefore we may wnte down at once the distributions 

(7.3) P(r„, ft,) = [*] ["J F(r u 

(7.4) ?<^> - [*](\t V*"' 

(7.5) P(r„ »,) = (* I 

(7.6) pfa /,- [*](* - 1 - f_T i)su " i )(" > t 

”■>-[:][:](*“ 

fan — b — (h -ik - A F(Sl) Si ) V ^ v r, 

\ Szh. ~ 1 / 

i = 1 , • • ■ ,k,j = 1 , • ■ ,h, 


(7.7) 
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corresponding to the distributions (2.6), (2.9), (2 11), (2 13) and (2.14) respec¬ 
tively. Of course there is some dependence among the arguments. In (7.4), 
for example, ni is determined by Siri, — fii , and ?'h by n — ni = in ■ In the 
last three distributions one of the n l is independent and one may sum these 
with respect to n\ from zero to n and obtain the distributions of the r’s alone 
The results of such summations are quite cumbersome and in some cases can 
only be indicated, so we shall retain the n , as relevant variables This remark 
applies also to the multinomial distribution 
We shall obtain expressions for the joint moments of the variables in these 
distributions. It is clear that the moments in Section 3 will be of considerable 
aid; for, using the notation of (7.1), we have 


(7.8) E(f(X)g(Y)) = Zf(X)g{Y)P(X, Y) = £ g(Y)P i (Y)[Zf(X)Pi(X/Y)} 

XY Y X 

and the sum in the bracket on the right has been computed in Section 3 It re¬ 
mains only for us to multiply the previous moments by g[Y)Pi(Y) and sum on 
Y. Corresponding to (3.4), (3.12), (3.9) and (3.19) we have 


(7.9) 

(7.10) 

(7.11) 


(7.12) 


a(«r> n* 1 ) - i «!•’(».+~ a, V"ri\ 

\ i / «i“0 \ tii — Sia, / 

eU* n *’) - 1 »;•’(».+d im (” ~ 2 *v 'pr, 

\ i / nj-=o \ tii — 2ia, / 

E(»f’r'ft = 

s(nS->n ft »!!■>) « L + 

\ 1 1 / \ Si 2 j (L{ 1 / 

.("* - 2jb, + h- l\ F(gi) Sa ) p ji p M 

\ Si — 2'b, — 1 / 


for moments from (7.4), (7.6), (7.5) and (7.7) respectively. In order to perform 
the summations indicated in these last relations it is necessary to expand the 
factors multiplying the binomial coefficient in factorial powers of its lower 
index. That is,' we must write 

“ +i> , * 

(7.13) tii o) (ti 2 + 1) CI,) = 22 CM a, b)(ni - b)‘°. 

t—0 


Again it is not possible to give a simple expression for the coefficients C,(n, a, b) 
in general, but for the first few moments they present no difficulty. For example 
from (7.9) 
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E(n t r u ) = L Wx (n -ni + l)( n l _ . ^p? 

m-0 \ ™1 1 / 


1 _»3 

P2 


(7.14) 


= 2 [»(« - i + 1) + (re - 2i)(»i ~ 0 + («i - i) (2J ] 


m 

/ ■ n \ 



In — i — 1> 

V nt- i ) 

) pW 

ii 

j ~i(n -* + !)( 

in — % — 

\ ni - i 


""s' 

1 

1 

. bO 

^ - in - 


\7li — 1 — 1 

= [tin 

- i + 1) + in 

— 2i) (n — 




S (2) „2l 


We give below some means, variances and covariances which will be required 
late?. 


E(t u ) - p[pi[(n - i - 1 )P 2 + 2], 

•E(su) = Pi[(« ~ fypi + 1], 

o>i.r u = pl +5 pa ((n — i — j) m p\ + (n — i — j)p 2 ( 1 + 5p x ) + Gpf 

-I (n-i - l)p* + 2][(rc - j - l)pa + 2]), 
“Vnr,, = p\*pi {(n - 2i) <s, pj + (n - 2i)Th(l + 5p t ) + 6 p\ 

“[(»-*- 1)Pj +' 2]’} + pjpi[(n - i - l)pi + 2], 
(7.15) <r fur „ = pipj 1 (n - i - j - 2) w p\p\ + 4(n - i - j - l)piPi + 2 


- [(ra - » - l)p» + 2][(n - j - l)pi + 2], 
, 'urn* - p! +t ?*{(n -i~k + 1)® - 2(n - i - k) w Vl 

+ in - i - k - lVVi - [(n - % - l)pa + 2][{n - k)p% + 1]}, 
<r. 14 .» = p\ k {(n -2k + 1)® - 2(n - 2 k) w Vl + (n - 2fc)®pi 

— [(7! — k)pt + l] 2 } + P*[(7! — k)pi + 1), 

»*!».,/ = PipU(« - k - j - 2)®P?P2 + 2(ti - k - j - l)pi(l + pj) 
+ 2(1 + - pi[(n - k)pt + 1][(ti - j - l)pi + 2]). 

In, order to obtain the distribution of runs in samples from a multinomial 
population, we multiply the distributions of Section 4 by 

(7.i6) pw = [”]np?‘. 


Corresponding to (4.1) and (4.2) then, we have 

(7.17) P(T til n,) = n P'Vw IT pr i 

*-i Uaj i 

P^ru) =n(”:r i 1 )F(r < )IIp?‘. 


(7.18) 
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In (7.17) r ,, is the number of runs of length j of elements with probability p,. 
In (7.18) r, is the total number of runs of elements with probability p, . As 
before, we shall investigate in detail only the distribution (7.18). The moments 
of ft, and r, fdllow at once from (7.8) and (4.5) 

(7.19) E (jl (nl a ' ] u[ hi] )) = E II (n'-'V - 1) M ) \ n ~ ^'1 f[ 

where «, = n t — r<. The means, variances and covariances of the n are 
E(r,) = np t ( 1 - p.) + p \, 

(7.20) tr r , ri = —np<p,(l - 2p, - 2p, + Sp.p,) - p,p,(2p; + 2 p, - 5 prp,), 
Vuu — n P»(l — 4pi + 6p, - Zp\) + p.(3 - 8p t + 5p(). 


8. Asymptotic distributions from binomial population. We turn our atten¬ 
tion first to the distribution (7.7) and state a theorem analogous to Theorem 2 of 
Section 5. 

Theorem 1 . The variables 


( 8 . 1 ) 


Ut = Xi = 


Ulc-X * - 


Uk+i = y, = 


Uk+k — z = 


Su - nplpl 



y/H ' 

Slfc 

— np\pi 


y/h 

S2» 

- nplpl 


y ’n 

ni 

- npi 


y/n 


i — 1, • * *, k 1, 


i — 1, ..., ft 1, 


are asymptotically normally distributed with zero means and variances and covari¬ 
ances 

c *i*i = P 1 P 2 — (2 i + 1)pi*Pj + 2p* <+1 pj, 
ffx { x t = — (i + j + l)pi +y pa + 2p{ +i+1 ps i 
= —(♦" + & + l)pi + *P> + Pi +<!+1 ps, 

= PiPs — ( 2 fc + l)pi*Pi ; 

°V,vi = ~ (* + J + l)PiP2 +J + 2piPa +m , 

(8.2) <r, m = ,p?pi - (2i + l)p[pY + 2p\pV + \ 

0»<v, = ~(i + j + 3)p2 +2 pa +2 + 2pl +1 pa +1 j 

<rx hVI = — (k + j + 2)p\ +i p’ 1 +1 + Pi + 1 P»(l + Vi), 

<r* ( . = tp\pl rf Pi +1 pa(l - 4pa), 

<r*». = (k + 1)PiPj ~ Pi(l + Pi), 

o-#(. = iplpi + PiPa +1 (l “ 4pi), 

<r« = Pipj ■ 
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We have taken s 2h and n« to be the dependent variables of (7.7). The method of 
proof of this theorem is the same as that of Theorem 1 in Section 5, and will be 
Emitted. As consequences of the theorem we have 
Corollary 1. The variable 

k+k 

Q => 2 a'UiUj 

i 

is asymptotically distributed according to the x~-law vnth k + h degrees of freedom. 

Corollary 2. Any subset u ix , u H , • - • , w< M of the variables (8.1) is asymptoti¬ 
cally normally distributed with zero means and variances and covariances || <r 1JH ||, 
and 


m 

Q = H a ,,,k u lj u, t 

n*-i 

is asymptotically distributed according to the with m degrees of freedom. 

j || is the inverse of || c ljlk || . 

Corollary 3. If s, = $u + su represents the total number of runs of length i of 
both kinds of elements, and s k the number of runs of length greater than k — 1, then 
the variables 


(8.3) 


_ s, - n(plpi + Pipt) 

x •-7=-1 

\n 

Sk - n(p k ipi + pipt) 

Xk = - — -, 

\ n 


i = 1, • * •, fc — 1, 


are asymptotically normally distributed with zero means and variances and co¬ 
variances 


(8.4) <F {T "b 4" <T|r<v; 

where the terms on the right of (8.4) are defined by (8.2). We have put h = k 
in Theorem 1 to obtain this result. 

Corollary 4. The variable 

k 

(8.5) Q = 2Z <r' y x.z/ 

i 


where the x, are defined by (8.3) and j| v’ j| is the inverse of (8.4), is asymptotically 
distributed according to the x-law with k degrees of freedom. 

Corollary 5. If r denotes the total number of runs of both kinds of elements, 
then 


( 8 . 6 ) 


x = 


r — 2npipi 


2\/npipi(l - ipipf) 

is asymptotically normally distributed with zero mean and unit variance. This is 
the result obtained by Wishart and Hirshfeld [11]. 
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9. Asymptotic distributions from the multinomial population. In this 
section we assume k > 2 to avoid degenerate distributions. Because of the 
function F(r,) in (7.18) we do not investigate this distribution directly, but 
derive a more general asymptotic distribution as was done in Section 6. We 
consider the distribution 

@ 1 ) ***.•*-h ([;>•) 


corresponding to (6.9). This is derived from (7.19) in the same manner as 
(6.9) was from (4.5). As before, we have replaced the numbers n, - 1 in (7.19) 
by rii, an unessential change as far as the asymptotic theory is concerned. 
We recall that 

(9 2) ri-rii- m„ 

hence we need only show that the variables on the right are asymptotically 
normally distributed in order to have the same result for the r,. Corresponding 
to Theorem 2 of Section 6, we state 
Theorem 1. The variables 


(9.3) 


rmj - np.p, 

Xtj /- 

■yn 

H 

1 

iH 

II 

■<sT 

n { — np, 

i» — 

■yn 

i = 1, • > •, k — 1 


are asymptotically normally distributed with zero means and variances and co- 
variances 


(9-4) 


— ~3 'PdPiPt'Pt) 

an,>t — ~3p,p«p<, 
ffiMi = P<Pj(l - 3p<p,), 

au.ii = 3 PiPj i 
<r„,. = -2p*p., 

= 2p?(l - pi), 


a i,,it — 3p,p,pi / 
an,ij = p»Pj(l 3p,), 

«r. M , = p?(l + 2p, - Sp!), 
a,,., = — 2 piPjPi, 
ax,.i = P.PX 1 “ 2pi), 

<Tx,] = y 


ai.i = P.(l ~ P*)- 


In these relations the symbols are defined by 

__ I 

aij.tt — a x ijx,, i a H* ~ > IT ‘’ , ~ 

and different literal subscripts represent different numerical subscripts. These 
moments have been computed by means of the identity (6.12). e 
the theorem is like that of Theorem 2 of Section 6 and will be omitted. We can 
now give the limiting form of the distribution of the in (7.18) as 
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Corollary 1. The variables 


(9.5) 


x t = 


n - npi( 1 - p,) 

y/n 


i — 1, 2, • • •, h 


are asymptotically normally distributed with zero means and variances and co- 
variances 


(9.6) 


<ru = p.(l — p, ) — 3p?(l — p,) 2 , 

<r.j = ~PipA 1 ~ 2 p { - 2pj 4 - 3 pep,). 


These limiting moments follow at once from equations (7.20). 

Corollary 2. The variable 

k 

Q ~~ IT £| X, 

1 

where the x , are defined by (9.5) and || a’ j| is the inverse of (9.6), is asymptotically 
distributed according to the yf-law with k degrees of freedom. 

Corollary 3. If r = Sr, denotes the total number of runs, then 

r - n( 1 - 2p<) 

Vn 

is asymptotically normally distributed with zero mean and variance 

S = Sp! + 22pJ - 3(Sp 2 ) 2 . 


The author would like to record here his gratitude to Professor S. S. Wilks 
who suggested the problem and under whose direction this paper was written. 
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A GENERALIZATION OF THE LAW OF LARGE NUMBERS 

By Hilda Geiringer 

It is well known that the law of large numbers can be established for dependent 
as well as for independent chance variables by using Tchebycheff’s inequality [1] 
and assuming that the variance of the sum of the variables tends towards 
infinity less rapidly than n\ 

In recent years v. Mises has introduced the notion of statistical functions [2] 
and has shown that, under certain assumptions the law of large numbers is still 
valid if, instead of the arithmetic mean of the n observations x x , • • , x„ a 
statistical function of these observations is considered. For example in the very 
special case, where the n collectives which have been observed are identical 
fc-valued arithmetic distributions with probabilities pi, • ■ ■ , p k corresponding 
to the attributes a, , c* and with observed relative frequencies n x /n, • • , 
njt/n one obtains the result: It is to be expected for every e > 0 with a probability 
P„ converging towards one as n -> <*>, that | f(ni/n, • • •, n*/n) - f(pi , • ■, p*) | 
< e under very general conditions concerning the function /. 

In the present paper we shall generalize these new results so that they will 
apply also to collectives which are not independent. 

1, Lemma concerning alternatives, Let us consider the n-dimenmnal 
collective consisting of a sequence of n trials and let us assume that the n trials are 
alternatives, i.e. for each trial there are only two possible results which we 
denote by "success," "failure,” by "occurrence,” "non-occurrence” or by 
"1,” “0.” The total result of the n trials is expressed by n numbers each equal 
to 0 or 1. Let v{x i} ®j, ■■■ ,x n ) be the probability of obtaining the result x x 
at the first trial, at the second one, ■ • ■ , x n at the last one (x v = 0, 1; v — 
1, • • , nj. In the same way we introduce y) = £ v ( x > V> x * > > *») 

*8» ' »®n 

and generally {/„„($, y) as the probability that the pth result equals x, the vth 

equals y, (y ^ v), and finally let v M (x) = £ v( 2 > y) be the probability that the 

1/ 

pth result equals x. In particular let us write 

v„(l) = p ?, v(l> I) = Vs* i (ft v = 1, • ■ • , n] n v) 

Pn being the probability of success in the pth trial and p^» the probability of 
simultaneous success both in the pth and vth trials. 

The variance s 2 n of the sum (xi + • • + x n ) is easily found: 
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s* = Var (2:1+ ... +x„) = £ fa 4 - +x n -pi -p») 2 nfa, ...*) 

*1.' * i** 

= X fa ~ Pl^fa, • ■ • , ®n) + • • • 

+ 2 £ fa - Pi) fa - p 2 )«fa , 

*1»‘ 'i*n 

9 

= 2 fa - Pi^ifa) + • • ■ + 2 X) fa - pi)fa — pdv ufa, ®j) + • • • 

si *i.*a 

= Pl(l - Pl) H-H Pn( 1 - P») 4- 2(p« - P 1 P 2 ) 4-h 2(p n _l,n - Pn_!p B ). 

Thus: 


fi n 

(1) s 2 = Var fa + • • ■ + *«) = X P»(l - P.) 4 2 E fa* - p„ p,) . 

»"“1 

The first sum on the right is £n/ 4; the second one consists of N = \n(n — 1) 
terms, therefore we cannot be sure that it tends toward zero after division by ri. 
Putting p„y — ptf, = a£° we see immediately: 

(a) A necessary and sufficient condition for lim s„/ra = 0 is 

n -> 00 


( 2 ) 


lim 1/w 2 2 = 0. 

r»—*00 p,y- n l 


Denoting by <r 2 the variance of fas) and by r„, the correlation coefficient of 
Vfa 2/) we have 


v< n) = 




Pit’ PidP’ 


Tpp(r fiO"y t 


We see that takes values between —1/4 and +1/4 and our conditions (2) 
postulates that the sum of these positive and negative terms tends towards 
infinity less rapidly than n 2 . As to the meaning of the signs of these terms we 


see that a term afa will be ^ 0, according as p^/p v ^ p M . 


This means: the 


fact that the rth event has presented itself makes the occurrence of the /uth 
event either more probable; or it is without influence on it; or it makes it less 
probable. And we see that sjn tends toward zero, only if there is a certain 
“equalization” or "stabilization” of positive and negative mutual influence. 
If in particular for a pair of values y, v, r„, = +1, that is 1) = «„i(l, 0) = 0, 
the events must either both occur or both fail and p„ = p, . If = -1 we 
have v(0j 0) = vfa 1) = 0 the simultaneous occurrence is impossible and 
likewise the simultaneous failure, and p„ + p* = 1. If we have = 0 (case of 
mutually exclusive events) then p N + p, S 1, 

n n 

Since s’ ^ 0 and 2 P»(l — P») = X v 2 S n/ 4 we conclude from (1) that 

v—l F ~1 

w 

X = —n/8 and we obtain the following simple sufficient condition for the 
validity of (2): 
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( b) Let us denote by m n the number of all combinations u, v(g ^ n; v ^ n; u ?£ ^ 
stick that ^ however large n may he, > e, where e is a given positive number, 

then \ 2 4? converges toward zero if lim m„/n s = 0. 

We have in fact 


u n 

-q i 2 4 ? 5 + (N - m n )e 

O ji,v«=*l 

1 n . 

and dividing by n we find that— £ 4? is enclosed between — and mjn + 

n i 8 n 

t —— 2 -- which both tend toward zero. Roughly speaking this condition implies 

71 

that for “almost all” combinations of mdices u, v, the 4? converge toward 
“negative or vanishing correlation ” 

n 

On the other hand the sum of all positive and negative terms in 4^ 

ft, V=l ' 

cannot become less than — n/8. Therefore, if “almost all” positive terms are 
supposed to tend towards zero it fallows that also almost all negative terms 
tend toward zero. Thus we obtam the sufficient condition (c) which is neither 
more nor less general than (6) • 

(c) The sum — 4? tends towards zero as n —*• »,if "almost all” the indi- 

71 fi,v= 1 

vidual terms 4" 1 = fV ~ VrPr tend toward zero. Or more exactly, the sum in 
question tends toward zero if 14“' | g «for every e and sufficiently large n with 
the exception of /i„ terms where lim Unfn 2 = 0. That is “convergence towards 

n-*oo 

independence” for almost all combinations u, v of indices Let us, for example, 
assume that all the p, are ^ 0 and all the = 0, then all the 4"' are certainly 
< 0 and (6) is fulfilled; but it is easily seen (3) that m this case pi + pi + ■ ■ 
p„ 1. Therefore all the products p$, (with the possible exception of a finite 
number) tend toward zero, and (c) holds as well. 


2. Statistical functions. Suppose n observations have given the results 
xi, Xi, • ■ ,x n . Let us assume for the sake of simplicity that they are all 
bounded between two real numbers A and B. To each real x corresponds the 
number n *S n (m) of observations with a result S x S n {x) is a monotone non¬ 
decreasing step function with n steps, each of height 1/ra; however several steps 
may coincide at the same point. We have 

(1) S n (x) = 0 if x < A and S„(x) = 1 if x A B. 

S B (*) is called by v. Mises the partition (Aufteilung) of the n observations 
S n (x) coincides with the well known cumulative frequency distribution if the 
attributes c« (k = 1, • • h) and the corresponding relative frequencies n x /n, • • ■ 
nk/n are given, 
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A statistical function is a function of the X\ , x 2 , ■ ■ • ,x n which depends only on 
S n (x), the partition of the n results. It will be denoted by f{ S„(x) }. If the c, 
and the njn are given then statistical function means simply “function of the 
relative frequencies” and it becomes a function of k variables. In /{#S„(m)} the 
partition S n (x) takes the place of the independent variable. Such a statistical 
function has the following properties: (a) It is a symmetric function of the 
Xi , xi , ■■■ , x n . That is, it is independent of the succession of the n results. 
(6) It is “homogeneous” in the following sense: If instead of n observations 
we have nl observations and if at the same time each x, is replaced by lx, then 
the-statistical function is not changed. 1 Examples of statistical functions are 
the moments 

= f X r dSn(x) = M°r 
n 1 J 

or, if Mi = a, the moments about the mean a : 

- 2 ~ “) r = f (x — a) r dS n (x) = M,, etc. 

n *-i J 

The independent variable in /[5„(2;)} is a partition; but in addition we shall 
define f{P(x ) 1 where P(x) is a certain bounded distribution which is not neces¬ 
sarily a partition. A distribution P(x ) is called bounded if 

(!') P(x) =0 if x < A and P(x) = 1 if x ^ B. 

If this is true for a sequence Pi(x), Pi{x), • ■ ■ with the same A and B then the 
sequence is called uniformly bounded. Let us now consider a bounded partition 
P(x) which in every point of continuity of P{x) is the limit as n —*■ « of a se¬ 
quence of bounded partitions S n (x). As 3 n [x) converges toward P(x), if 
/{S n (a;)) converges towards a limit L which does not depend on the limiting 
process S n (x ) —> P(x) then that limit shall be denoted by f{P(x) }; it will be 
called the value of the statistical function at the "point” P(x) and f\S n (x)} will be 
called continuous at P(x ) The definition of continuity can be given also in the 
following way: Corresponding to every e > 0 exists an i? > 0 such that 

(2) \f{S n (x)} -f[P(x)} |<« 

for all values of n and for every bounded S n (x) such that at every point of 
continuity of P(x) 

(3) | S n (x) - P(x) | g v . 

In this case/{*S„( 2 :)] is called continuous at the point P(x). Thus a statistical 
function is defined for bounded partitions and for certain bounded distributions 
which are not themselves partitions. If the continuity defined by (2) and (3) 
exists for a sequence Pi(x), Pfx), • • • of bounded distributions with the same v 


1 This condition of homogeneity is fulfilled e g. for V" XiXt • ■ ■ x n but not for X\X% ••• x n ■ 
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corresponding to a given e, we call the statistical function uniformly continuous 
at the points Pi(x), Pt(x), - ■ ■ . 


3. The general law of large numbers. The generalization of the law of large 
numbers which we have in mind can be demonstrated in a way analogous to the 
demonstration given by v. JVfises in the case of independent collectives if we 
introduce the results of paragraph 1 in order to estimate the variance. We shall 
consider here only one dimensional, bounded collectives in order to make clearer 
what is the essential of the generalization. 

A sequence of dependent collectives Pi(x), P^x), ■ ■ ■ , P n (x) can be given in 
the following manner. Let P(xi , , ■ ■ , z„) be the probability that the result 

of the first observation is ^ , of the second g x 2 , ■ ■ ■ , of the nth g x„ , 
This distribution will be said to be bounded in (A, B) if P = 1 when all the x, 
are § B and P — 0 if at least one of these arguments is less than A. From this 
n-dimensional distribution we deduce n one dimensional distributions 


(1) 


Pi(x) = P(x, B, ,B), 


Pi(x) = P(B, x,B, ■■■ ,B), , P n (x) = P{B, ■ ■ , B, x) 


where P„( x) is the probability that the vth observation be g i The P,{x) are 
uniformly bounded in (A, B) which is a consequence of P(xi, i 2 , ■ ■ •, x„) having 
been assumed to be bounded in this interval. In an analogous way we deduce 
from P(a;i , xt, , x n ) the f n(n — 1) uniformly bounded two dimensional 
distributions 


(2) Pu(x, y) = Pix, y,B, • • • B ), P«( x, y) = P{x, B, y, B, • ■ ■ B), - ■ ■ 


Here P M ,( x, y) is the probability that the /xfch result is the rth result ±=y, 
and we have P„,( x, y ) = P^(j/, x). Of course we have also 

(10 Pi(x) = P K (x, B) = Pjs(x, B) = • • ■ = Pi n (x, B) 

Pi{x) ~ Pu(B, x ) = Pn(x, B) = .. = P 2 n(r, B) etc. 

If we put in (2) x = y we obtain P„,(a;, x) = P r)1 {x, x ) and we introduce 


(3) 


P^ix, X ) = P,ix{x) = P»v{x) 


the probability that both the /Ah and the vth observation is Ax. Then P^(x) 
equals zero if x < A and equals one if x ^ B, and this is valid with the same A 
and B for all the distributions P,^(x). 

Now if pi, p t , , p n are the probabilities of success for n general alterna¬ 

tives Tchebycheff's Lemma asserts that the probability W that the average 
+ jjj + ... + x „) /n of n observations difEers by more than v from its expecta¬ 
tion (pi + P 2 + ■ ■ • + P*)/ n is subject to the following inequality 


(4) 


W A -z Var 
T 




+ Xi + 


+ $n ’ 


2 

ij 2 n 2 ' 


Here s* is given by (1), of paragraph 1. 
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Let us introduce the average P n (x) of the P y (x ): 

(5) K(x) = [P L (x) + Pi(x) + • ■ ■ + PJx)]/n 

and let Q n be the probability that at any point of continuity of PJx) the in¬ 
equality 

( 6 ) ' | S n (x) -P n (x)\> r) 

holds. Our aim will be to show that for every 7 under certain restrictions re¬ 
garding the given collectives, Q n tends toward zero as n tends toward infinity. 

For a fixed point x 1 the probabilities P,(x) = p v and P M *(x) = are constants 
and we put P„( x) = p n = (pi + Pt + ■ p n )/n. The probability that in x' 

(7) | S n (x') - P„(*') | > v /2 

is then, according to (4) smaller than (s»)*'/( 57 ) V. Here we denote by (s*)** 
the value of s\ in x' (as given by ( 1 ) in paragraph 1 ), 

Now we divide the interval (A, B) in N parts in such a way that in every one 
of the N intervals e.g. in (x 1 , x") the variation 

(8) 5 = ?„(*") - P n (x') < y/2. 

If there is at x 1 (or at x") a step of P n (x) we take the limit which T n {x) approaches 
as x —> x‘ (or, a:") from the interior of the interval. In order to obtain such a 
division we need only divide the total variation 1 of P n (x) in 2/7 equal parts and 
project these points of division on P„(a;), disposing however in a suitable way of 
horizontal parts of P n (x). The abscissae of these points form the endpoints 
of the N intervals. If there is a step of P n (x) at an endpoint of one of these 
intervals the variation in both the adjacent intervals can only be diminished 
It is further possible that the two ends of an interval coincide x’ = x", this will 
be so if P n (x) has for x 1 a step > tj/2. In any case we have a division in N g 2/77 
intervals such that all the points of continuity of T n {x) are enclosed in them and 
in each of these intervals ( 8 ) is valid. 

Let us now assume that in the left end point x' of the rth interval {x 1 , %") the 
inequality 

(9) | S n (x') - ?„(*') | g v /2 

is valid. Then we have for every x between x ' and x" 

(10) | S n {x) — P n (x) j 5= 7/2 + 5 ^ 7 . 

Because, since S n (x ) and P n (x) are both monotone, the difference S n (x') — 
P„( x') cannot increase by more than S g 7/2 as x varies from x 1 to x". There¬ 
fore if ( 6 ) is valid for any point x in this interval then (7) must be valid for 
the left end point x' of this interval and the probability q r of this latter inequality 
is less than or equal to 4(s*)*'/V«. 

But there are N intervals with the left endpoints x[ , x'i, ■ ■ ■ x' N and the 
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probability that ( 6 ) may be valid in. any point belonging to any one of these 
intervals gi + » + ••• + . Denoting by si the greatest of the N 

variances (sn)*i > ( s «)*2 > • • ■ , (s n )xy we have for Q n (which is the probability that 
( 6 ) may be valid at any point of continuity of P(x)) the inequality 

(11) Qn ^ qi + qi + • • ■ + qx A ^-.sl A - — 

i? 2 n 2 t ; 3 n 2 

Therefore Qn tends toward zero for every y if s„/n tends toward zero. 

But according to (2) in paragraph 1, s„/n tends toward zero if for every x in 
(A, B) 

(12) lim ^ £ [PM) ~ P„(x)P,(x)] = 0. 

n-*M U 1 

Considering the definition of continuity of a statistical function we have ob¬ 
tained the following result: 

As m (1'), (2), (3) and (5) let P ? ,{x, y) be two dimensional distributions (y, v — 
1 , ... v), uniformly bounded in (A, B); PM, B) = P M (r); P M ,( x, x ) = 

PM) and P>( x ) - l/v(Pi(x) + P 2 (x) + . . + PM). 

If the variable partition 8 n (x) is bounded in (A, B) and if is uni¬ 

formly continuous at the “points" Pi(x), Pi{x), ■ ■ • then the probability that 

(13) \f{Sn(x)) ~f{Pn(x)} | > 6 

tends toward zero for every t as n —> «>, provided ( 12 ) is uniformly valid for every 
x in (A, B). 

4. Examples. Let us illustrate by simple examples. 

1 ) In order to define the P,{x) etc. mentioned in our theorem we define the 
n-dimensional distribution P(xi, xt, • • • x n ) used at the beginning of paragraph 
3 by indicating the probability density 

Ml , *2 ,•••,*») = C»[l - aw* • • • X„] in the “unit cube”, 

^ =0 , elsewhere. 

The corresponding probability distribution is 

r a 1 P x n 

(2) P(x 1 , Xz, • • ■ , Xn ) — / “ ‘ I y{xi, * 2 , • • • , Xn) dx I • • • dx n . 

J— 00 *1—0© 

By putting 

(3) C" = 2 » —”i’ 

we see that P(xi, Xi, • ■ ■ , z; n ) equals unity if all the arguments are £ 1 and it 
equals zero if one of these arguments is less than 0. Therefore P(xi , , • • •, 

2 „) is bounded in the unit cube. 
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(4) 


From (1) we deduce the two-dimensional densities 

V(Sj y) = c n (l - in the unit square, 
= 0 elsewhere 

and the distributions 

(5) P^x, y) = f f v u ,(x, y) dx dy. 

«/— 00 00 

We see that 


Piirfa, V ) = C„xy yl — ~J in the unit square 

= 0 if x or y g 0 

= 1 if x and y ^ 1 

and e.g. for x ^ 1, 0 < y < 1 we have P„,(x, y) = P„,( 1, y) etc. Thus the 
Pp.ix, y) are completely given. 

It follows from (3) that —C„/ 2" — 1 —■ C n ; therefore putting C n = C we 
have in (0, 1) 


P„(z, x) = P„(s) = Cx* + (1 - (7)x 4 
Pr(*) - <7* + (1 - C)x 2 

P„,(aO - P,(x)P,(x) = (7(1 - (7)x 2 (l - x ) 2 

is < 0 for every x in (0, 1) since (7 > 1. For x £ 0, P^ v (x) and P„(x) both 
equal zero and for x ^ 1 they both equal 1 . Therefore our conditions of para¬ 
graph 1 are fulfilled. We see that (7„ tends towards unity as n —► oo, therefore 
for every x in ( 0 , 1 ) P„ v (x) —■ P ll (x)P,(x) tends towards zero, we have "conver¬ 
gence towards independence” but by no means independence. 

This example was based on a symmetric density. Let us give an example of 
asymmetric and arithmetic distributions. For the sake of simplicity let Pi(x), 
Pj(x), • • ■ be arithmetic distributions each with only three steps at x = 0, 1 
and 2 . As starting point we take the /^-dimensional arithmetic distribution 
r(xj, Xj, • • ■ x„) which gives the probability that the first result equals xi, the 
second the nth x n , the x, being equal to 0 or 1 or 2; thus r(xj, x 2 , • • • , 

x„) takes 3 values the sum of which equals unity We deduce the two dimen¬ 
sional distributions v^x, y), e.g. vn(x, y) = ^2 v(x, y,x», • • ■, x„), the prob- 

ability that the first result equals x, the second y, and finally the Vi(x) = 
£ Vrt(x, y), etc, According to the definitions of P„(x) and P^(x) we have then: 


( 6 ) 

therefore 

(7) 
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(8) P,(x) — 0 (s < 0) 

= m (o s * < i) 

= »,(0) + iv(l) (1 £ X < 2) 

= 1 (2 £ *), 

(9) Pp(x) = 0 (* < 0) 

= V(0, 0) (0 £ * < 1) 


= »„(00) + ^(10) + tv,(01) + v(ll) (1 S * < 2) 

= 1 (2 £ *). 

Now we subject vfa, ■ ■ ■, x n ) to the following conditions: Every v(xi, •••,!„) 
equals zero if it contains either: at least two “zeros/’ or: at least one “zero” 
and one “one/’ or: at least two “ones.” All the other r-values are supposed 
to be different frotn zero. Then we have 

iV,(0, 0) — tv»(lf 0) = tv,(0, 1) = tv,(l, 1) — 0 

therefore P»,(x) = 0 for x < 2 and P„ r {x) = 1 for x A 2. On the other hand 
tv(0) = «(2, 2, • • • 2, 0, 2, . • 2) and »„(1) = i>(2, 2, ■ • • 2, 1, 2, . -2) there¬ 
fore P,(x ) -A 0 for 0 § x < 2 and we have thus for every finite n 

Pp,{x) — P^{z)P,(x) =0 for x < 0 and x ^ 2, 

<0 for 0 % x < 2. 

Therefore the condition (b) of paragraph 1 is fulfilled and thus (12) paragraph 3 
holds. 

I hope to have the opportunity to discuss more general applications of this 
theorem later. 

A generalization of the strong law of large numbers may be given in a simi¬ 
lar way. 
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CONDITIONS FOR UNIQUENESS IN THE PROBLEM OF MOMENTS 

By M. G, Kendall 


It waa shown by Stieltjes [1] that in some circumstances it is possible for two 
different frequency distributions to have the same set of moments. For in¬ 
stance, the integral 

J z in+3 e~‘e‘dz 


around a contour consisting of the positive z-axis, the infinite quadrant and 
the positive j/-axis is seen to be zero and it follows that 

f x n e~ xi sin x i dx = 0. 

Jo 

Thus the frequency distribution 


( 1 ) 


dF = $e~ x \l — X sm x l ) dx 


0 < a: < », 
0 <X < 1 


has moments which are independent of X, and equation (1) may be regarded as 
defining a whole family of distributions each of which has the same moments. 
It is easy to see that moments of all orders exist, and in fact 

Hr (about the origin) = |(4r + 3)1. 


A second example of the same kind, also due to Stieltjes, is the distribution 

. . 1 0 < x < «, 

(2) dF = ~ 1 j i _ * sin bg *)} dx " “ 

e Vi 0 < X < 1, 

for which 

Hr = e Hr+2) . 

The question naturally arises, what are the conditions under which a given 
set of moments determines a frequency distribution uniquely? The question 
is of great interest to mathematicians, being closely linked with problems in the 
theory of asymptotic series, continued fractions and quasi-analytic functions; 
and it also has importance for statisticians since there is sometimes occasion to 
be satisfied that a problem of finding a frequency distribution has been uniquely 
solved by the ascertainment of its moments or semi-invariants. Stieltjes him¬ 
self considered a more general problem: given a set of constants Co , 
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d , • ■ c r , • • • does there exist a function F, non-decreasing and possessing an 
infinite number of points of increase, such that 

(3) [ x'dF = c r 

Jo 

and under what conditions is F unique, except for an additive constant? 
Stieltjes showed that if we express the series 

( 4 ) Z (-i) r -; 

as a continued fraction of the form 


\Q^ ■ . j— • • • ■ ■ • • ■ 

®l2 + d2 + asZ + 01 + Ojn-lZ + Oin + 

it is a necessary and sufficient condition for the existence of at least one F that 
all the a’s be positive; and that the function is unique or not according as the 

qo 

series Z (®r) diverges or converges. (If the a’s are positive it must do one or 

r «”0 

the other.) The integral of equation (3) is to be interpreted in the general 
Stieltjes sense, so that the result applies to discontinuous as well as to continuous 
distributions. This is also true of the results obtained below. 

Hamburger [2] discussed the similar problem when the limits of the integral 
in equation (3) are ± «>, and showed that a function F exists if the expression 
of (4) as a continued fraction of the form 

bo b i 6s 

gives positive values of the 6’s. In order that F may be unique it is necessary 
and sufficient that the continued fraction be completely (vollstandig) convergent 
in the sense defined by Hamburger. 

Unfortunately these criteria, though mathematically complete, are not very 
useful to statisticians because as a rule it is too diffi cult to express the coefficients 
a and b explicitly enough in terms of the given c’s to enable questions of sign or 
of convergence to be decided. So far as I know, no more convenient criterion 
for the general Stieltjes problem has been found; but progress is possible if one 
considers the narrower question: given a set of moments, is the distribution 
which furnished them unique, that is to say, can any other distribution have 
furnished them? This is more limited than the Stieltjes problem because we 
know that at least one solution exists. 

Contributions to this subject have been made by L6vy [3] and Carleman [4], 
L6vy shows that if moments of all orders exist and are positive it is a sufficient 
condition for them to determine a distribution uniquely that /n remains 
finite as n tends to infini ty. (Here and elsewhere in this paper /x, refers to the 
moment of order r about any point, not necessarily the mean.) Carleman shows 
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that, for the case of limits — °° to + °° the moments determine the distribution 
uniquely if 

L_ 

r-0 (Hlr) UW 

diverges. For the limits 0 to « he gives the corresponding series 

00 1 

v_L_ 

f=i 0«,)y«w 

a criterion which can be improved upon, as will be shown below. 

The purpose of this paper is to develop criteria of this kind more systematically 
and to give more general criteria suitable in cases where the moments are not 
known explicitly but the behavior of the frequency distribution at its terminals 
is known. 

Three preliminary points necessary for the later argument may be noted, 
( 1 ) Define the absolute moment of order r by 



and recall that 

vi < j4 < v\ < * • • < v) ,r < • • • 

(cf. Hardy and others, [5]). In other words the quantities v)! r form an increas¬ 
ing positive sequence and their reciprocals a decreasing positive sequence. 

( 2 ) The quantity v 1 J n /n must either tend to a limit or diverge to infinity as 
n —» *>. For suppose that 

lim v]! n /n = k, 
lim v]l n /n = l. 

Writing temporarily v]! n = a„, we have that, given e there is an JV such that 

a n /n > k — « 

for an infinity of values of n greater than N. Similarly there is an M such that 

a n /n < l -j- e 

for an infinity of values of n greater than M. Now choose p such that a p , a^+i 
are two consecutive values, one near the upper limit and one near the lower 
limit. This can always be done and we can take p as large as we please. We 
then have 

a p > p(k — e) 

< (p +. 1)G + *) 
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and hence, since Op+i > a„ 


(k — t)p < (p + 1)(I + «) 

giving 

(k — Z) < —(-2* + -. 

P P 


Thus k — l can be made as small as we please and is thus zero. 

The argument can be very simply adapted to the case in which fc is infinite, 
and if l is not finite k, being not less than l, is infinite. Thus as n —» <* either 
lim a n /n exists or a„/n —> «> - 1 

(3) If any moment fails to converge, so will all moments of higher order. It 
is evident that more than one distribution can exist having a limited number 
of finite moments given and the remainder infinite. Thus we need only consider 
the case when moments of all orders exist. Furthermore, if any even moment 


exists the absolute moment of next lowest order must exist; for if 



x 2n dF 


/ 0 .<B 

x 2n dF and I x ln dF exist separately, each being positive. 

90 JO 

-0 A® /' c0 

Hence ( x i ”~ 1 dF and / a : 2 ” -1 dF exist separately and thus / |x s ” _1 (dF = 

J — oq> Jo J—oO 

-0 [•* __ 

_ JL xJn_1 ^ + J x inl dFex ists. Hence we need only consider the case in 

which absolute moments of all orders exist. 

Theorem 1. A set of moments determines a distribution uniquely if the series 


93 V ( 

T, converges for some real non-zero t. 
r -0 rl 

Consider the characteristic function 



e ix 'dF. 


This is uniformly continuous in t, and so are its derivatives of all orders. Thus 
we have, in the neighborhood of t = 0 the Maclaurin expansion 


0(0 = 



+ R 


_ ^ (%t) r 
r^o r! 


Pr + R- 


1 This proof is necessary to the use of limits in the following theorems, but Theorems 2 
and 3 are equally valid if lim is substituted for lim therein. It is not generally true that 
if (in and bn are increasing monotonic sequences either lim On/bn exists or fln/b ■ —> « as 

H —► 00 . 
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Consequently, under the condition of the theorem, which implies that 2 


(it? 

H 


Mr 


is absolutely convergent for some radius p, 4>(t) has a Taylor expansion in the 
neighborhood of the origin and is thus uniquely determined by the moments for 
t < p. Furthermore, in the neighborhood of t = to we have 

tit) = £ j°° x r e“° x dFj + R. 


(I _ K) T 

The modulus of the coefficient of --j—— is not greater than v r . Therefore </>(<) 

can be expanded m the neighborhood of t = k in a Taylor series with a radius 
of convergence at least equal to p. Hence the function defining 4>(t) in the 
neighborhood of the origin can be continued analytically throughout the range 
— oo to + oo and 4>(t) is uniquely determined in that range. 

But the characteristic function unqiuely determines the distribution; and 
hence the theorem follows. 

As a result of Theorem 1 we have the following generalization of the criterion 
given by IAvy. 

Theorem 2. A set of moments completely determines a distribution if lim v? n /n 

n-» oo 

is finite. 

It has already been seen that unless v]! n /n becomes infinite the limit exists. 

v £ r 

By the Cauchy test for convergence the series 2 ~ converges if 

rl 


(7) 



< 1 . 


As n— >oo, (n 1 ) 1 ^ tends, in accordance with Stirling’s theorem, 
(y/2m e _ V) 1/n i.e. to n/e. Consequently the condition (7) becomes 

lim [v? n /n\ et < 1. 


to 


Thus if lim v]! n /n = k, say, the inequality (7) is satisfied for t < 1 /(ek) and the 
theorem follows. 

An important corollary, which enables us to disregard the absolute moments 
(which may not be given if part of the range is negative) is 
Theorem 3. A set of moments uniquely determines a distribution if 
lim pln^/n is finite. 


For 


Vln-l 


_ _ l/(2n) 
u " M2n 


S V2n = 


Thus, 


lim - 


1 


2n - 1 


1/ (2n—1) 

'Vln-l 


< lim 


2n 1 i/( 2 n) 

2 n - 1'2n MJn 
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and is therefore finite if the limit on the right is finite. Thus lim v\[ n /n, which 
cannot be greater than the greater of the two limits of f(2n - 1 ) and 

v \ l } in] /(2n), must be finite; and the theorem follows from Theorem 2 . 

* 1 

Now consider the series 2Z~ip. Since the successive terms form a monotonic 
r~0 v T 

sequence it is a sufficient as well as a necessary condition for convergence that 
n/v\[ n tend to zero. Thus, if the series is divergent n/?," cannot tend to zero 
'and so v]! n /n cannot become infinite. Hence it must tend to a finite limit,which 
may in particular be zero. Hence from Theorem 3 we get 

Theorem 4, A frequency distribution is uniquely determined by its moments if 

oe 1 

Tj r diverges 
r —0 V r 

Since l/»v /r is a decreasing sequence the series 2 l/v' /r converges or diverges 
with 2 l//i 2 r (2r) . The Carleman criterion, given by him for the case of limits 
± oo, follows. For the case of limits 0 to “ the absolute moments are the same 
as the moments and the criterion can be the divergence of either 2 1 /yJ T or 
2 l//4f (2r) . Since y r is greater than unity m the type of case under consideration 
the former series provides a more stringent test than that given by Carleman. 

At first sight it is rather surprising that the uniqueness of the distribution 
depends only on the behavior of the even moments, particularly when, hy a 
simple extension of the above result, it is seen that a sufficient condition for 
uniqueness is the divergence of 2 l//4n 4n> or 2 l/^ nn) or any infinite subset 
chosen from the moments It will, however, be remembered that the odd 
moments are conditioned to some extent by the even moments, and that unique¬ 
ness is really determined by the limiting form of v n as n tends to infinity. 

It is evident that other tests may be derived from Theorem 1 by using the 
various tests for the convergence of an infinite series. For instance it is a suffi¬ 
cient condition for a set of moments to determine uniquely a distribution with 
positive range that 

— 1 _L _L wftprp ** ^ 

n\f faTTji 1 + » + W + 7’ 0 > 0 

i.e. that 

« ^, = 1 + »- +0 (^)' T>0 - 

It may be noted in passmg that the distribution 

dF = e~ x dx 0 <x < », 


for which 


y r (about origin) = r! 

is completely determined by its moments. In fact, by direct reference to 
Theorem 1 we see that the series 2 ( it) r converges for t < 1. 
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A frequency distribution of finite range is uniquely determined by its moments. 
For if the range is 0 to A we have 

Hr = t x T dF < A p 
Jo 

and hence 1 jyj r > 1/A so that the series S l/yj r is divergent. 

A proof for the case when the frequency distribution is continuous has been 
given by lAvy, though on entirely different lines from the above. 

Theokem 6. A frequency distribution of infinite range is uniquely determined 
by its moments if it tends to zero at the infinite terminals faster than e~ x . 

Consider first of all the case when only one end of the range is infinite, so 
that we may take the range to be 0 to ». 

If ( y n /n\) lln has a finite limit the distribution is unique, by Theorem 2. We 
have then only to consider the cases (if any) in which (jn„/n!) 1/n tends to infinity. 
It will be shown that in fact such cases do not occur. 

Given any (small) «there exists an X such that 



x>X 


where f(x ) is the distribution. Thiis 

(9) f f(x)x n dx < t 

J x 

This is true for all n and X is independent of n. Now, 

f f(x)x” dx = f f(x)x" dx + f f(x)x n dx. 

Jo Jo ' J x 

The first integral on the right is not greater than X n . The integral on the left 
tends, for large n, to something of greater order than n I, by our hypothesis, and 
hence to something of greater order than n n . This is of greater order than X n 
(since X, however large, is independent of n) and consequently the second in¬ 
tegral on the right is also of greater order than n !. But this is contrary to 
equation (9). 

The case for the range which is infinite in both directions may be dealt with 
similarly. 

It is easily seen that the two examples of equations (1) and (2) do not tend 
to infinity faster than e~*. 

Except for the general result of Stieltjes, all the above criteria provide suffi¬ 
cient conditions, but whether the condition of Theorem 1 is also necessary is 
not certain. An inquiry into the circumstances in which the moment-series 
of Theorem 1 does not converge throws some light on the question. 

It will be remembered that the characteristic function always exists and is 
uniformly continuous in t. Since the moments of all orders are assumed to exist 
we always have 


/; 


e~ x x n dx < ml. 
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[>L - 


(i) r 


Mr. 


_dt T ji_o 

Thus, if <t>(t) can be expanded in an infinite Taylor series that series must be 

S M- a ,. And if this series does not converge then cannot be expanded 
r! 

as an infinite Taylor series. But it can always be expanded in the finite form 
with remainder 

4>(f) = H —j- Mr + R. 

r ~0 r! 


Thus, when the series does not converge, <£>(£) can be expanded in powers of t 
only asymptotically 

Now it is known that there exist an infinite number of functions which have 
a given set of coefficients in an asymptotic expansion; for instance, if has 
an asymptotic expansion in t the functions i p(i) + Xr ioe ‘ all have the same 
expansion. It is therefore hardly surprising that when the conditions of 
Theorem 1 break down there can be more than one frequency distribution with 
the same set of moments 

But it does not follow from what has been said'that there must be more 
than one frequency distribution. There must be more than one function, but 
those functions may not qualify as frequency distributions, e.g. they may be 
negative in part of the range. In the example just given t lo ‘ 1 cannot be a 
characteristic function, for it does not obey the well-known condition that 
and<f»(—£) should be conjugate. 

However, the question is more of mathematical than of statistical interest 
since the criteria provided above are likely to be adequate for the distributions 
encountered in practice. For example they establish the uniqueness of the Pear¬ 
son curves (including the normal curve), the Poisson and the binomial. It 
would seem that distributions like those of equations (1) and (2) will appear 
only as statistical curiosities. 
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ON SAMPLES FROM A NORMAL BIVARIATE POPULATION 

By C. T. Hsu 

1. Introduction. In a number of papers written during the last ten years, 
J. Neyman and E. S. Pearson 1 have discussed certain general principles under¬ 
lying the choice of tests of statistical hypotheses. They have suggested that 
any formal treatment of the subject requires in the first place the specification 
of (i) the hypothesis to be tested, say Ho , (it) the admissible alternative hy¬ 
potheses. An appropriate test will then consist of a rule to be applied to ob¬ 
servational data, for rejecting Ho in such a way that (tit) the risk of rejecting 
Ho when it is true is fixed at some desired value (e.g., 0.05 or 0.01), (iv) the risk 
of failing to reject H 0 when some one of the admissible alternatives is true is 
kept as small as possible. With these general principles in mind, they have 
investigated how best the condition (iv) may be satisfied in different classes of 
problems. In many cases, though not in all, it has been found that the condi¬ 
tions are satisfied by the test obtained from the use of what has been termed 
the likelihood ratio, [9], [10], [14]. Once the problem has been specified, the 
test criterion is usually very easily found, although its sampling distribution, 
if Ho is true, often presents great difficulties. In the present paper, I propose 
to use this method to obtain appropriate tests for a number of hypotheses con¬ 
cerning two normally correlated variables. The investigation was suggested 
by a recent application of the method by W. A. Morgan [6] to a problem origin¬ 
ally discussed by D. J. Finney [3]. 

2. The hypotheses and the appropriate criteria. A sample of two variables 
%i and *2 is supposed to have been drawn at random from' a normal bivariate 
population, with the distribution 

(1) ho^S [(H 4 )’ 

where fi, &, <ri, <r t , and pn are the population parameters, 

Morgan tested the hypothesis that the variances of the two variables are 
equal, i.e., 

Hi: ci = C2 • 


1 See bibliography at the end of the paper. 
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Other hypotheses that will be considered in the present paper are as follows: 

H t : Assuming <n = o-j ; to test pn = p 0 . 

H ,: Assuming <n = a ; to test £ x = £ 2 . 

Hi : To test simultaneously 01 = 02 , Pl2 = Po . 

Ht: To test simultaneously °-i = 02 , £1 = £ a 

Hi : Assuming oi = 0-2 and £1 = & ; to test Pl2 = Po . 

Ht : Assuming 01 — 02 , and p w = po ; to test £1 = £ 2 . 


Derivation of the criteria. Let Xu , x 2 , be the measurements of the two char¬ 
acters on the fth individual of the sample, then the joint elementary probability 
law of the two sets of n observations E = (x n , x n , ■ .. , x u ; x n , X& , ... , 
X 2 O is 


(2) • exp K-o^5S[( IJ v i ‘)’ 


- 2 “( s v i ')(^r) + ( 


It will be convenient to denote by A, B, C, D, the following conditions of the 
population from which the sample is supposed to be drawn. 


(A) that stated in equation (1). 

( B ) that stated in the equation for Hi , namely 

<?i = tri = <r(o-being unspecified). 

(C) £i = £2 = £(£ being unspecified). 

( D ) P12 = po . 


Neyman and Pearson’s method affords a simple rule for obtaining appropriate 
test criteria once two sets of conditions have been defined. These are 

(a) the conditions which can be assumed to be satisfied in any case, and 

(b) the conditions which are satisfied if the hypothesis to be tested is true. 

The conditions (a) define a class Q of admissible populations, and the condi¬ 
tions ( b ) define a sub-class w of fi to which the population must belong if the 
hypothesis tested be true. 

The maximum value of p(E | £1 , £4 , <n, ffa , P12) when the parameters vary in 
such a way that the population sampled always belongs to Q, is called p(il max.). 
The maximum value when the population is restricted to 10 is called p(w max.). 
The likelihood ratio for testing the hypothesis specifying the subset w has been 
defined to be 


X = 


p (w max.) 
p (S 2 max.) ‘ 


(3) 
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It will be seen that 1 < X < 0. By referring X, or a monotonic function of X 
to its sampling distribution when the hypothesis tested is true, we obtain a 
scale on which to assess our judgment of the truth of the hypothesis tested. 

For each of the hypotheses Hi to If?, X of (3) can be found. However, we 
shall use a more convenient criterion 


(4) 


L = X 2/ " 


which is a monotonic function of X. 

Thus the respective test criteria are found to be ■ 
For Hi : 

4a!«l(l - r 2 n ) 

(s\ + s!) 2 (l - R\) 


2 ?* 5 5 

where Ri = - h - - I is the estimate of p i2 when <ri and at are assumed to be equal. 


For Hi: 

( 6 ) 

For Ht: 

(7) 

For Hi: 

( 8 ) 

For Hi : 

(9) 

For Ht: 

( 10 ) 


Si + s 2 


7 _ (l - pS)(i - Rl) 

2 (1 - poRiY * 




1 + 


(xi — Xi) 




s 2 + S 2 — 2rnSiSi. 


T — 4(1 ~ Pq)siS2(1 — r) _ _ 

4 (A + 4) 2 (i - P«Bi) 2 1 *' 


u = 


islsld - rli) 


{si + Sj + ^(xi — x 2 ) 2 ) (1 — Rl) 


— Li XU. 


r _ (1 - Po)(l - Rl) 
4 (1 - p°Ri) 2 


where R, = - 12SlS2 -- ~ **>* i 


si + 82 + ^(Xl — X2) 2 
the £’s are assumed to be equal. 
For Hi: 


is the estimate of pi 2 when both the <r’s and 


(id u = 1 /( 1 + ( ! + po)(fi ~ * 2)2 

' ( 2(81 2po7*12SlS2 -f- i 

The different hypotheses are also given in Table V, at the end of this paper,. 


)M 2 

si)) 
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together with the conditions defining sets of ft and co, and the appropriate likeli¬ 
hood criteria. 

To complete the solution we must find the distributions of L or some mono- 
tonic function of L in each case when the hypothesis tested is true, in order to 
assess the significance of an observed value of L. 


3. The distributions of the criteria. In order to simplify the problem of 
finding the distributions of the criteria, consider the following transformation: 

= ( x t - y»)/V2 

( 12 ) 

v a* = (x, + y.)/V 2. 


It is clear that in view of (1) X and Y will be two normally correlated variables. 
We shall denote this property by A' corresponding to A. The conditions B', 
C\ D' corresponding to B, C, D respectively are as follows: 


B'\ 

Pxr = 0, 

C': 

£x = 0 , 

D'\ 

1 1 

<*Y — 7o ff x 

whe^e 


(13) 

1 + 
Y# = l— 


(when prr = , 0 ) 


Thus we have the equivalent hypotheses Hi , Hj • • • H\ corresponding to 
H x , H i} Hi . The likelihood ratios L[ , I/i • • ■ L, may be determined in 
the same way as before, and, in view of the transformation ( 12 ), it will be 
seen that they are equal to Lx, Li • ■ ■ Ln respectively. 

The teats of the hypotheses H [, H '», H 1 , are now seen to be well known. 

The test of H[ : p X r = 0 is the test for significance of a correlation coefficient, 
and the criterion Lx becomes 


(14) Lx = Xir,* = 1 — r xr . 

This test has been dealt with by Morgan [ 6 ] and Pitman [15], and has been 
referred to above. 

The test of Hi \ <r\/ax = 7 o when pxr — 0 can be treated as an extension 
of Fisher's 2 -test [5], since 70 is specified. If we write 

N Sr _ 1 + Rt _ Si + sa + 2 tx 28 \Si 

(16) U ~s[~ r^Ri = . s\ + si- 2 r ia sis a 


the test criterion h of ( 6 ) may be written 


(16) 


U = 


4 u 

7o(l + w/ 7o) J 


It is well known that if Ha is true, then 
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(17) 


p(u) = 


1 


YoB [\{n - 1), %(n 


^7) (*) (* + 7.) 


—(n—1) 


and the test appropriate to Hi and therefore of Hi is the associated s-test (z = 
| log u/ 70 ) with degrees of freedom /i = fa = n — 1. It may be easily shown 
that the two values of u cutting off equal tail areas from the distribution p(u ) 
will correspond to a single value of Li. 

The test of H' a : £* = 0 when p X r = 0 is in the form of “Student’s” t test. 
If we write 


( 18 ) 


e = x 2 = fa - xif 

n 1 Sj s i Sj — 2ri2SiSj 


it follows that the test criterion Li of (12) may be written 

(19) L,-l/(l + J-f). 

But it is well known that if = 0, then 

(20) pit) - ^ ^ (l + ^Tj) 

The 5% or 1% points of significance of t may be obtained from Fisher's f-table 
[5] with degrees of freedom / = n — 1. 

The tests of H t and H b . We infer from (14), (16) and (19) that L\ is a func¬ 
tion of rxr , Li a function of iS r and Sy , and La a function of X and S x . It is 
clear that if r XY is distributed independently of Sx and S r , then Li and Li are 
independent, i.e., 


(21) p{L \, Li) - p[Li)p(Lt) 

and that if r XY is distributed independently of X and S x , then L\ and L s are 
independent, i.e., 

(22) p(Li, Li) = p(Li)p(Li). 

It is known that X, Y are independent of S x , Sy , r xy ; and in addition that 
r xr is distributed independently of S x , S Y if pxr = 0. Therefore, if Hi is 
true, then the relations (21) and (22) hold. Hence, knowing p{Lf) and p(Li), 
a very simple transformation and integration gives p(Lf), Similarly, the dis¬ 
tribution of Lt may be readily derived from those of Li and La, 

But from the distribution of r XY when pxr = 0, by transformation (14), the 
distribution of Li assuming Hi true is found to be 

. (23) . 

If H a is true, from (17), by transformation (16) we have 
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(24) kCW - iS'-'d - «->. 

Again, if Hi is true, from (20), by transformation (19), we have 

(25) ViU) = B [|(n - 1), J] i ’ Cn 3> ^ “ Li} * 

which is the same as the distribution of Li . Therefore by comparing (21) and 
(22) we see that the distribution of U when Hi is true will be exactly the same 
as that of Li when H [ is true, We shall therefore confine ourselves to the 
problem of obtaining the distribution of Li from those of Li and Li. 

Now 

< 26) BtK» - 2), WKn - 1), i i ri<, ~“ (1 - ^--a - «-*• 

Applying the transformation 

Li — Li Li 

(27) Z = Li 

and integrating with respect to Z from 0 to 1, we obtain 

(28) v(Li) = \{n - 2)L{ ( ^\ 0 < Li < 1. 

Thus we can construct the values of Li at the 5% and 1% levels for different 
values of n as given in Table I. 


0 


5 

6 

7 

8 
9 

10 

12 

15 

20 

24 

30 

40 

60 

120 


TABLE I 

5% and 1 % values of Li (or L 6 ) 
6 % 


.1357 

.2509 

.3017 

.3684 

.4249 

.4729 

.5493 

.6307 

.7169 

.7616 

.8074 

.8541 

.9019 

.9505 

1.0000 


1 % 


.0464 

.1000 

.1585 

.2154 

.2683 

.3162 

.3981 

.4924 

.5995 

.6579 

.7197 

.7848 

.8532 

.9249 

1.0000 
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The test of Hi . In the case of testing — yoax), assuming p 1Y and 

Px each to be zero, the likelihood estimate of <r% becomes XX 2 /71 or S* + X' 1 
The distnbution of this quantity is the same as that of S\ but with degrees of 
freedom n instead of n — 1. Therefore, by analogy with the previous result 
(17) used in testing Hi , if we write 


(29) 


n& Sk 1 + R 2 

2X> S’x + F 1 - & 


then the likelihood criterion of Hi becomes 


(30) 


and 


(31) 



U = 


4v 




yo 


on*-;)' 


—(n—i) 


7oB[K« ~ 1), in] 

Hence the test appropriate to H, is the associated z-test 2 = £ log / -—- 

l7o / n 


with/i = n — 1, ft = n. We can use the z-table as before. 

The test of Hi . Here we test whether £x = 0. It may be seen that Li is 
a function of % 2 /(Sl T 7o*Sx)- Further, if we assume that pxy = 0 and also 


that o-r = 7 o»Sx , then it will follow that 2{X — X) 2 and - X{Y — Y) 2 are each 

70 

distributed independently as x<j\ with n — 1 degrees of freedom; and hence 
their sum is distributed as xV* with 2n — 2 degrees of freedom. Also if £* = 0 
(and Hi is true) ‘X will be distributed normally about zero with standard error 



Hence we may write 


(32) 

Li = 1 

/1 1 + 2 n - 2 } 

where 



(33) 

e = y / i - X) 1 + 2(7 - 7)Vyo 

• 1 n(2n — 2) 


and is distributed in accordance with "Student's” distribution with 2 n 
degrees of freedom, 


(34) 


p(<a) 


1 


V2n - 2B[i i(2n - 2)] 


( X + 2^2) 


-N2n-1) 


2 , 


In terms of original variables 


( 35 ) 



y,F 

7«fi* + S\ 


(1 + pa) (xi — xf) 2 
2 (s? — 2/> 0 ru 81 s 2 + s\) 
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4. Comparison of the Si-test and ft-teat with the r M -test in cases where H t 

and Ht are true respectively. It will be noted that in the preceding discussion 
we have been concerned with three different tests of the hypothesis that Pia 
has some specified value p 0 When there is no information available regarding 
the means and standard deviations of xi and x 2 , the test is based on the sampling 
distribution of the ordinary product-moment coefficient r u . If it may be as¬ 
sumed that ffi = ff 2 , then we have the estimate 


Ri 


2f 12 Sl Si 


fi* + si' 


If besides en = m , it may also be assumed that , then we have the 

estimate 

_ 2ri a si Ss — ~ xiY 

Si + si §(ii — XiY 


From the point of view of testing hypotheses, all these criteria r 12 , R, , R? 
follow from the application of the likelihood ratio method It will he noted 
that if <ri = <ra, either the r n or the Ri test may be used. But, insofar as the 
likelihood principle' is accepted, the latter should be regarded as the “better” 
test Again, if aj = <r 2 and = &, all three tests may be used, but that based, 
on Rt will be the “best”. A question of interest is to investigate just what is 
meant by the “better” or the “best” test. We may ask how far the improve¬ 
ments are sufficient to juB'tify the use of the Ri and R 2 tests in place of the more 
generally used ru test. One method of comparison is to examine what Neyman 
and Pearson [12] have termed the “power function” of the tests. 

For example, when testing the hypothesis that a parameter 6 has the value 
do in the population sampled, the power of the test criterion T with regard to 
the alternative hypothesis that 6 = 8\ > do is given by the expression /3(0i) = 
P{T > T a \8 - 0i} where T{ is the value of T at the level of significance a. 
This quantity /3(0) measures the chance that the test as specified will detect 
the fact that d = 8 0 , i.e., the chance of rejecting the hypothesis when it is not 
true A test whose power function is never less than that of any other test is 
termed the uniformly most powerful test. 

If the permissible alternative hypotheses to 6 = do are both 8 < 8$ and 8 > 0q, 
then the power of the test T is given by the expression 

P(8i) = I - f[T' a < T < 

where T' a and T" a are the values of T at both ends of the distribution at the 
level of the significance a. When the test is such that the power function has 
a minimum value a at 6 = 0o > it is said to be unbiased. 

A test is termed biased if, for certain alternative hypotheses 8 ^ 8 0 , the chance 
of rejecting the hypothesis 8 = 8 o is less than the chance of rejecting this hy 
pothesis when it is true. 
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In what follows it is proposed to compare the power functions of the tests 
based on r u , Ri, and Rt in order to obtain more complete evidence of the 
extent to which one is “better” than the other. 

The distribution of Ru 2 We have obtained the distribution of n when H\ and 
therefore Hi is true. We are now able to find the distribution of Ri by apply¬ 
ing the transformation of (15). Thus the distribution of Ri in terms of p 0 is 


(36) 


p(fli|po) = 


(1 - pi) (1 - Rl) i(n ~ 3) 

2 B_2 B[|(n - 1), tin - 1)] (1 - PoRi)"- 1 


The significance of Ri may be assessed by the 2 -test, where we take 
(37) 


„ 1. u 1, 1 + f2i 1, 1 + po 

Z = 2 lo6, f. “ 2 l0E “ 2 l0e 1^7, 


= 2 ; - f, say 


with degrees of freedom /i = fa = n — 1 R. A Fisher’s 2 -table may be used 
in this connection. 

When pi 2 = 0, the distribution simplifies to 


(38) 


P(«i|pu - 0) - —— 


Bft(n 

1 


B[^(7l — 1), 


1), Kn - 1)1 

(1 - Rl) iin ~ 3) 


(1 - Rl) i(n - a) 


since 2 2n 2 B[^(n — 1), \(n — 1)] is equal to B [$(n — 1), by duplication for¬ 
mula [16, p. 240]. 

The distnbution (38) is similar in form to that of p(rn|p 12 = 0) with n — 1 
degrees of freedom instead of n — 2 The significance levels of R\ may then 
be obtained directly from the r-table [1] for the case pn = 0, entering with 
degrees of freedom n — 1. 

The distribution of Rt . The distribution of R 2 may be obtained from that of 
v when and therefore H 6 is true. It is 


(39) p(R 2 pu — po) 


(1 + po) in (l - po)* 01 ' 1 ’ (1 + Ri) i(n - >} (1 - R 2 ) Hn ~ 2) 
2"-*B[*(n - 1), \n] (1 - p 0 R 2 ) B -‘ 


This agrees with the result first obtained by R. A. Fisher [4] by a different 
method. The significance of R 2 may be assessed by the 2 -test, where we take 


(40) 



2 Since finding the distribution of Ri (36), (38) and the relation between R i and i' (37), 
my attention has been drawn to a recent paper by DeLury [2] in which the same results • 
are obtained Since my method of derivation is different from his, I have thought it 
worthwhile to retain it here. 
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with degrees of freedom ft n 1 j ft — n. The tables for use with the 2 -test 

may be used in this connection. 

When pis = 0, the distribution is simplified to 


(41) j)(Rt\pa — 0) 


1 

2"~‘B[Kn - 1), i»] 


(1 + Ri) i(n ~ 3) (1 - 


which is simply a Pearson Type I curve. 

Power functions of Rt and, J 2,. In order to find the power functions of R x and 
Ri with respect to alternative hypotheses H t to Ht , specifying pa = pi < po, 
it will be convenient to consider the incomplete beta function distributions 


(42) 

(43) 


p(* 0 = 


B[i(» - 1), i(» - 1)] 


- 2 1 ) 4(n - ,> 


p(Xi) = 


B[i(» - 1), in] 


4 (n - a, (l -xf) 




where Xi = 


u 


and x 2 = 


Prom the Tables of the In- 


7a (1 + V/ 70 ) 2 7.(1 + »/t.)‘ 

complete Beta Function [13] we can find the values of x x and Xt at the significance 
level a, i,e. 


(44) hi [i(» - 1), }(n - 1)] - a', 

(45) 1^ »(n - 1), in] = 

The values of R[{oi), and of Ri(a), may then be calculated from the relations 


(46) 


•u — 1 _ — 1 -f *1 ~4~ 7e*l 
u + 1 1 — xt + yo*i ’ 


(47) 


v — 1 _ — 1 + *2 ~h 70*2 
V + 1 1 — Xt + 70*2 


The power functions of Rt and Rt thus found may be given as follows: 

(48) p'( P , | Rt) = P{Rt < R't(cc) I Pt\, 

(49) 0'(p, | Rt) = P\R* < &(<*) I *!• 

In the same way, for any alternative hypothesis Ht specifying pa = pt > Po , 
we can find the values of Xt and x, at the significance level a", at the other end 
of the distribution, i.e. 

(60) 1 —hp [K w — 1)» i( n _ 1)1 = a "> 

(51) 1-J-r &(»-«,**]-“"• 

Thence the corresponding values of Ri(a) and Rt (a) may be obtained, and their 
power functions are 


( 52 ) 


/9"( P< | Rf) = Pffli > Bi(«) 1 P> 1 
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(53) P"(pt | A) = P{Ri > J2i(«) | P<). 

The power functions of R\ and Ri with respect to alternative hypotheses speci¬ 
fying pis = pi < pa and > p 0 may now be obtained by adding (48) and (52) or 
(49) and (53) or, more simply, 

(54) p(p, I BO - 1 - P{Ri(a) <R 1 < R"(a) \ p t \, 

(55) j8(p, | BO = 1 ~ < Ri < B"(«) | p ( } 

where R[(,a), R"(a)\ Ri(a), Ri(ot) are the values of Ri and R 2 at the two ends 
of the distribution at the significance level a = a' 4- a!'. 

In view of the fact that after transformation the tests based on Ri and R 2 
are equivalent to tests regarding the equality of variances, it follows from Ney- 
man and Pearson’s work [11] regarding the uniformly most powerful test of the 
hypothesis that <r\/a\ = 70 , with alternatives a\/<r% = 71 < 70 (or 7 , > 70 ), 
that: (1) if a 1 = at and alternative to p 12 = op are that pi 2 = pi < po (or, in a 
second case, p< > po) the test based on Ri is the uniformly most powerful test, 
i.e., it is more powerful than that based on r u ; and ( 2 ) if 07 = <r 2 and £1 = &, 
then the test based oil Ri is the uniformly most powerful test, i e., it is more 
powerful than those based on either r 12 or R\. 

For illustration, let us take a special case, say 


(a) n = 10, po = 0.6, 

From the tables, we obtain the values 
x[ = .198902 
x" = .801098 

and by calculation the values 

R[( a) = -.0034 
Bi'(o) = .8831 


a' = a" = 0.025. 

x'i = .184863 
x'i = .772916 

R't(oc) = -.0487 
Bi (a) = .8632. 


The values of the power functions of and Ri for specified values of p ( have 
been calculated and are given in Table II. For p ( < p 0 , a comparison of 
columns 2 and 4 will show that the test based on Ri is uniformly more powerful 
than that based on Ri (or for p ( > po, a comparison of columns 3 and 5). 

The unbiased, test of Hi and H t . When however the alternatives are that 
pis = Pt < pa , and pt > pa, questions of bias may be introduced. 

In the case of Hi, i.e. when Ri is used, it was established by J. Neyman in 
his lecture courses [ 8 ], that if we test whether vr/vx = To, where the alternatives 
are 7 , < 70 and y t > 70 , and if the samples of X and Y are of equal size, then 
the test based on cutting off equal tail areas of the distribution of 27 is unbiased 
and of the type B [7], Therefore the same may be said of the ffi-test. 

In the case of Hi , the equivalent transformed test is again whether = 
70 . But the test now corresponds to that in which an estimate of c-y is based 
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on fi — n ' 1 degrees of freedom and an estimate of &x on ji — n degrees of 

freedom. The degrees of freedom not being equal, it is known that if equal 
tail areas are cut off from the sampling distribution of xi , this test will be 
biased. Neyman’s result [8] shows that if the lower and upper significance 
levels are taken at Xa and , then the equation 

(56) - XaY* = **'‘(1 ~ **)'* 

should be satisfied if the test is unbiased. Since in the present case, with the 
test based on equal tail area critical region, the bias will be very small, the 
rejection levels Ra(a) and Ri (a) in the numerical investigation given in Table 
III have been selected taking equal tail areas for simplicity. 


TABLE II 


Values of the power functions of Ri and Ri with respect to alternative hypotheses 

Pn — p t < po or p j > pi 


(n = 10; po = 0.6; a! = a" = 0.025) 


Pi 

fl'fwl Ri) 




-0.8 

.9984 




-0.6 

.9739 


.9807 


-0.4 

.9867 


.9005 


-0.2 

.7189 


.7360 


0.0 

.4960 

.0002 

.5093 

.0001 

0.2 

.2744 

.0008 

.2809 

.0006 

0.3 

.1825 

.0018 

.1860 

.0015 

0.4 

.1106 

.0042 

.1111 

.0037 

0.5 

.0576 

.0099 

.0580 

.0093 

0.6 

.025 

.025 

.025 

.025 

0.7 

.0081 

.0678 

.0080 

.0720 

' 0.8 

.0016 

.1995 

.0015 

.2150 

0.9 

.0001 

.5950 

.0001 

.6289 

0.95 


.8979 


.9150 

0.976 


.9866 

. 

.9897 


If we now take a special case, similar to (a) above, but taking equal tail areas, 
so that 


n = 10 p = 0.6 
a = 0.5 (a' = a" = Ja) 


we can obtain the values of x’s and of R ’s as before. 

The values of the power functions of Ri and R, for specified values of p, are 
given in columns 3 and 4 of Table III. These values are equivalent to the 
sums of the corresponding values in Table II. The values o , e power un 
tions of R x and R x for the following additional cases are also given in I able ill. 
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(b) 

n = 

10 

Po = 0.8 

a = 0.05 

(c) 

n = 

20 

Po = 0.6 

a = 0.05 

(d) 

71 = 

20 

Po = 0,8 

a — 0.05, 


Comparison of the power functions. We may now deal with the question 
raised at the beginning of this section, namely, as to what is meant by the 
“better” or “best” test. We shall proceed to compare for certain special cases 
the power functions of the three test, all of which are applicable where it may 
be assumed that <n = a %, £1 = fe . 

In the first place it will be noted that the power function of the test based on 


equal tail 

areas of the m distribution is 

(57) 

P(pt | ria) = 1 — p ( 7 ( 2 ( 0 :) < 712 < 712 (a) | p,} 

where 

r r ii (a> 

P[rn < r( 2 (a) 1 po! .= p{rn | P 12 = po) dr 12 = la 

r 1 

(58) 


P\rn> r[ 2 {a) | p 0 } = p(r 12 \ Pn = p„) dm = la 

Jr u(«> 

and 



(59) 


p(jn | P 12 — po) = 


(1 ~ pl)'^ f1 _ 2 \)(n—4) (± Y" 2 COB"! (-pong) 

7iT[§(ra - 1)] 12 \dr n ) V(1 - plr]f) ' 


The probability that r u is less than some specified value may be obtained from 
Tables of the Correlation Coefficient (F. N. David, [1]), or, where these are not 
sufficiently detailed, by using R A. Fisher's ^'-transformation for m [4]. 

The cases considered are (a), ( b ), (c), (d) as defined above. The power 
functions of the three different tests (all based upon the equal tail areas of their 
distributions) are given in Table III. The figures for m in the brackets are 
those obtained by the z'-transformation approximation. 

An examination of Tables II and III brings out the following points: 

(1) For reasons given above, the R 2 test based on equal tail area critical 
regions is very slightly biased; the amount of this bias for the case n = 10, 
po = 0.6, a = 0 05 is shown in Table IV. This shows that the power of the R 2 
test is less than 0.05 in the fifth or sixth decimal places for 0.59 < p t < 0.60. 
As a result this test is very slightly less powerful than the other two tests for 
alternatives with pt slightly less than po. The effect is, however, of little im¬ 
portance. 

(2) Except in this short range of p,, we find that 


/3(p< | Rf) > 0(pt | Ri) > /S (p t | fu). 



TABLE III 

taponson ,j IU ymer fuvdum of ft,, & ■ ft ** ** to 
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That is to say, the power function of the R 2 test never lies below those of the 
and r l2 tests, and that of the fl 2 test never lies below that of the r a test. 

(3) The gam in sensitivity as measured by the chance that the test will 
detect that pt po is, however, very small. Further, Ri may only be used if 
it is known that = 01 and i? 2 if it is known in addition that £1 = . It will 

only be in rather special problems that the statistician can feel confident that 
such assumptions are justified, We will therefore probably prefer the test based 
on the ordinary product moment correlation coefficient r 22 , since the slight loss 
in power will be felt to be'Outweighed by the gain in simplicity. It is, however, 
only after an objective comparison of the consequences of applying the three 
tests that a definite opinion on these points can be reached. 


TABLE IV 


' pi 

P'(p,\Ri) 



0.5 


.0093 

.0673 

0.590 

.0274235 

.0225806 

.0500041 

0.591 

.0271778 

.0228190 

.0499968 

0.592 

.0269359 

.0230578 

.0499937 

0.593 

.0266934 

.0232976 

.0499910 

0.594 

.0264615 

.0235337 

.0499852 

0.595 

.0262096 

.0237798 

.0499894 

0.596 

.0259677 

.0240222 

.0499899 

0.597 

.0257257 

.0242651 

.0499908 

0.598 

.0254838 

.0245107 

.0499945 

0-599 

.0252419 

.0247540 

.0499959 

0.6 

.025 

.025 

.05 


6. Summary. Various hypotheses relating to a population of two normal 
correlated variates have been considered and the appropriate test criteria for 
each hypothesis have been derived by the likelihood ratio method. The dis¬ 
tributions of the likelihood ratio criteria or of monotonic functions of them have 
been obtained with the aid of transformation (14). References have been given 
to tables from which significance levels for use in conjunction with the tests 
may be obtained; a new table of significance levels for the tests of H t and Ht 
was given. 

The power functions of ru, Ri and R 2 have been compared; from these power 
functions it was concluded that Ri and Rt are suitable respectively for testing 
the hypothesis when <ti = v 2 and when, in addition, ft = & . 

In conclusion, I should like, to express my indebtedness to Professor E. S. 
Pearson for continued advice and help in the preparation of this paper, to Dr. 
A. Wald and Professor S. S. Wilks for valuable suggestions. 






Conditions defining 0 and 01 together with the likelihood criteria appropriate for testing the hypotheses Hi 
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ON A LEAST SQUARES ADJUSTMENT OF A SAMPLED FREQUENCY 
TABLE WHEN THE EXPECTED MARGINAL TOTALS ARE KNOWN 

By W. Edwards Deming and Frederick F. Stephan 


1. Introduction. There are situations in sampling wherein the data fur¬ 
nished by the sample must be adjusted for consistency with data obtained from 
other sources or with deductions from established theory. For example in the 
1940 census of population a problem of adjustment arises from the fact that 
although there will be a complete count of certain characters for the individuals 
in the population, considerations of efficiency will limit to a sample many of 
the cross-tabulations (joint distributions) of these characters. The tabulations 
of the sample will be used to estimate the results that would have been obtained 
from cross-tabulations of the entire population. 1 The situation is shown in 
Fig. 1 in parallel tables for the universe and for the sample For the universe 
the marginal totals JV\\ and N,j are known, but not the cell frequencies N„ ; 
for the sample, however, tabulation gives both the cell frequencies n,, and the 
marginal totals n,. and n,,-. 

In estimating any cell frequency of the universe, such as N,j , three possi¬ 
bilities present themselves; from the sample one may make an estimate from 
the i‘th row alone, another from the jth column alone, and still another from the 
over-all ratio specifically, the three estimates would be n,,iV,./rc, , 

n t jN,,/n j , and n, ,N/n. As a result of sampling errors these will not be identical 
except by accident, and though any of them by itself may be considered ac¬ 
curate enough, still, if the whole r X s table of universe cell frequencies were so 
estimated, the marginal totals would not come out right. In this paper we 
present a rapid method of adjustment, which in effect combines all three of the 
estimates just mentioned, and at the same time enforces agreement with the 
marginal totals. The method is extended to varying degrees of cross-tabulation 
in three dimensions. 

In any problem of adjustment where the conditions are intricate it is neces¬ 
sary to have a method that is straight-forward and self-checking; this becomes 
imperative when we realize that in the three-dimensional Case VII of the 
problem now at hand (i ride infra), any adjustment in one cell must be balanced 
by adjustments in at least seven others. The method of least squares is one 
possible procedure for effecting an adjustment and at the same time enforcing 
certain conditions among the marginal totals. It is essentially a scheme for 

1 Examples will occur in the 1940 census publications. Further discussion of this prob¬ 
lem and of the sampling procedure is given by the authors in, "The sampling procedure 
of the 1940 population census,” Jour. Am. Slat Assn., Vol. 35 (1940), pp 6 5 6 
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arriving at a set of calculated or adjusted observations that will satisfy the 
conditions of the problem, and at the same time minimize the sum of 
the weighted squares of the residuals, symbolized as 

(1) S = 2 w(n a — n 0 ) 2 

n„ and n a being the calculated and observed numbers in a cell, and n„ ~ the 
corresponding residual. It is the nature of the conditions impcJsed on the ad¬ 
justed values that distinguishes one type of problem from another. Least 
squares has the practical advantage of uniqueness, once the weights of the ob¬ 
servations have been assigned, and it possesses the theoretical dignity of giving 
one kind of “best” estimates under ideal conditions of sampling. For our 
present purpose we shall minimize sums of the form 

(2) S = X(m t — 

n, being the observed frequency in the tth cell, and m, the calculated or adjusted 
frequency therein. The conditions among the will arise from the fact that 
the marginal totals, after adjustment, must agree with their expected values, 
namely, the deflated marginal totals of the universe (for example, nit, and m., as 
defined in eqs. (6) and (7)). 

By definition, weight and variance are inversely proportional, hence the 
principle of least squares is identical with the minimizing of chi-square. Here 
the variance in the tth cell i6 v,(l — vi/n) , where v, is the expected number in 
.that cell, and n is the total number in the sample. Now if is sufficiently 
well approximated by n ,, it follows that if no cell contains an appreciable 
fraction of the whole sample (a circumstance requiring a fair sized number of 
cells—perhaps 100), the variance may be taken as v, for every i, and the mini¬ 
mized S can be used as chi-square. But regardless of the number of cells, if 
the n< be not too much different from one another, so that the factor 1 — vjn 
may be treated as a constant, we still get the least squares solution by minimiz¬ 
ing S as defined in eq. (2). 

2. The two dimensional problem. Suppose that the data on two character¬ 
istics (e.g. age and highest grade 'of school completed) are obtained for each 
member of a universe of N individuals, and that tabulations of the data provide 
either (a) one set of marginal totals Ni ., N 2 ,, • ■ , N r . ; or (b) in addition, the 
marginal totals N.i, N.t , ■ ■ ■ , N. The nature of the tabulations is presumed 
such that it is not feasible to count the numbers IV,, in the cells, as would be 
done if one character were crossed with the other. Suppose, however, that for 
a sample of n individuals selected in a random manner from the universe, the 
two characters are crossed with each other, so that we know not only all the 
• + f marginal totals n, i, • • , n T of the sample but also the numbers n„ 
(i = 1 , 2, ■ • , r;j = 1 , 2, • , s). The problem is to estimate the unknown 

frequencies Na in the cells of the universe. This will be done by finding the 
calculated or adjusted sample frequencies m t] and then inflating them by the 
inverse sampling ratio N/n. 
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For the least squares solution we seek those values of nt„ that minimi ze 2 

(3) $ = 2(m<, - n,;)Vn,y 

wherein the m ,, are subjected to one of the following sets of conditions: 

Case I ■ One set of marginal totals known. Assume , ■ ■ ■ , N r to be 

known. Then we require 

(4) £ m, t = m,. , i = 1,2,..., r. 

These r equations constitute r conditions on the adjusted mi,-. 
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n, 
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Marginal totals N,i and Ni, known 
N known 


nj 

n,i known 





n rs 


fr 

n. s n 


Marginal totals n j and n, known 
n known 


j — 1, 2, • • • , 8 1 


Fio. 1. Showing the System of Notation for the Cell Frequencies and Marginal 
Totals of the Universe and the Sample in the Two Dimensional Problem 

Case II: Both sets of marginal totals known. Here the adjusted cell frequencies 
must satisfy not only condition (4) but also 

(5) £ m,i = m.i 

there being now a total of r + s — 1 conditions. In both cases, 

(6) m, = Ni.n/N, 

(7) m., = N.jn/N. 

In other words, m<. and m., are the deflated marginal totals, 

divided by the actual sampling ratio N/n. - The mi. and m, are not independent, 

for 

“* The sign v'wiU denote summation over all possible cells, unless otherwise noted. 
X Will denote summation over all values of i, and similarly for an inferior j or 
d^, as m », , will signify the result of summing the no over all values of 4 in the jth 
column. 
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(8) N i + N.% + • ■ + N. a = Ni + N 2 , + • ■ • + N r , = N. 

It is for this reason that if i runs through all r values in eq. (4), then j can run 
through only s — 1 in eq. (5). A similar equation also exists for the marginal 
totals of the sample, namely, 

(9) n,i + n,t + • ■ • -f - n.» = wi. + n%, + ■ • * + n T . = n. 

Solution of the two dimensional Case I. Assuming that the adjusted values 
of the m t] have been found, let each take on a small variation 5m,, ; then the 
differentials of eqs. (3) and (4) show that 

(10) §55 = 2{(m„ — ».,)/«„) 5m„ = 0 (one equation), 

(11) £ Sm„ =0, i = 1, 2, .. •, r (r equations). 

J 

Multiply now eq. (lit) by the arbitrary Lagrange multiplier — X,. , and add eqs. 
(10) and (11) to obtain 

(12) 2((m.y — nj,)/n„ — X,.}5m,, = 0. (one equation). 

By the usual argument, one may now set each brace equal to zero, recognizing 
that the r Lagrange multipliers are then no longer arbitrary but must satisfy 
the relation 

(13) mu = n„(l + X<.). 

The adjusted frequencies m„ can be computed at once as soon as the X, are 
found. To evaluate them one may rewrite the conditions (4) using the right- 
hand member of (13) for m*,, obtaining 

(14) m,. = n,,(l + X,.). 

Another way to arrive at this same relation is to sum each member of eq. (13) 
in the ith row. However obtained X, is now known, since m,. and n t , are 
known, and in fact eq. (13) now gives 

(15) m„ = n<,(m, /n,). 

The adjustment is thus a simple proportionate one by rows, the cells in any one 
row all being raised or lowered by the proportionate adjustment in the row total. 
Case I thus amounts to r independent one dimensional proportionate adjust¬ 
ments, one for each row, and any one or all may be carried out, as desired. 
This result can be obtained by a simpler approach but is presented in this way 
for consistency with later cases. 

The minimized sum of squares may be computed directly, or from the row 
totals by seeing that 

( 16 ) S = Yj bni - n.y/n,.. 

X 

The term (mi, — n, ) 2 /n<, for the ith row may be considered separately, and 
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used as % with s - 1 degrees of freedom, or all rows may be combined into 
the minimized S as given in eq. (16), and used as with r(s - 1) degrees of 
freedom. 

Solution of the two dimensional Case II. In addition to eqs. (11) we now 
have also 

(17) £ &m l} = 0 j = l ( 2, ..., s - 1 

which comes by differentiating eqs. (5). By addition of eqs. (10), (11), and 

(17) , after multiplying eq. (lit) by -X, and eq. (17j) by -X 3 , we obtain 

(18) 2{(m„ - - X< - X. 3 l«m i3 = 0 

Equating each brace to zero, as before, we-find that 

(19) ro„ - m,(l + X,. + X ,) 

wherein X, is to be counted 0. The adjustment is now no longer proportionate 
by rows, but involves every cell. 

To evaluate the Lagrange multipliers in eq. (19) we may sum the two members 
downward and across in Fig. 1 and obtain the r + s — 1 normal equations 

n { . h. + 22 = m,. — n, , i = 1, 2, •.., r 

(20) ^ 

2-i tttfXi. + n.fX j — m. t — n.f, j = 1,2, • • •, s — 1. 

4 

These can be reduced for numerical computation The top row solved for 
X(. gives i 


(21) 

X,. = (l/n,.)[m,. 

j 

whereupon by substitution into the bottom row of eqs. (20) we arrive at the 
8—1 normal equations 

X.i 

X,2 • • * 

X.»—i — 1 

r«a«a 

,X . nt. 

V' 

« Hi. 

- mi 22 “ 
t m , n,. 

(22) 

nano 

n,i Zj „ 

i W(. 

22 “ 

i nt. i Ui. 


n.,. 

i n,. . 


0 . 

Because of symmetry in the coefficients, those below the diagonal are not shown, 
indeed, in a systematic computation, they are not used. The 0 in t e ot om 
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row is appended for the computation of the minimized S, if desired. The 
number of Lagrange multipliers to be solved for directly is s - 1, and the 
remaining ones come by substitution into eq. (21), X. s being counted 0, 

A simple procedure for calculating the coefficients in the normal equations 
(22) is to set up a preparatory table by dividing each in the t'tb row by y/nf k ; 
also to write down m^./y/n~ for that row, for use on the right-hand side of the 
normal equations (compare Tables I and II). In machine calculation the con¬ 
stant divisor y/n^_ would be left on the keyboard until the entire ith row is 
divided; or, if reciprocal multiplication is preferred, the multiplier l/-\Ah would 
be left on. From this preparatory table, the cumulation of squares and cross- 
products in the vertical gives the required summations for the coefficients. The 
sum check would be applied in the usual manner. 

3. A numerical example of the two dimensional Case II. The fact is that 
in practice one need not bother about forming and solving the normal equations 
because they will be displaced by a simplifying iterative procedure, to be ex¬ 
plained in a later section. For illustration, however, we may do an example 
both ways, first using the normal equations and the adjustment (19), later on 
accomplishing the same results by the quicker method. 

We may start with the unitalicized numbers in the 4X6 array of Table I, 
assuming these'to be the sampling frequencies n„ to be adjusted. Actually, 
they were obtained by deflating 1 /20th (for a supposed 5 per cent sample) the 
New England age X state table on p. 1108 of vol. 2 of the Fifteenth Census of 
the U. S., 1930, then varying the deflated values by chance with Tippett’s 
numbers to get our sampling frequencies n t] . The italicized entries in Table I 
represent the final (adjusted) to,j , and it is these that we now set out to get. 
We start off with the sampling frequencies n„- and the known marginal totals 
m i, to 2 , etc, where to, = N^n/N, m , = N./n/N, as in eqs. (6) and (7). 
The Lagrange multipliers shown along the left-hand and top borders arise in the 
calculations now to be undertaken. 

Table II is the preparatory table, advised at the close of the last section. It 
is derived from Table I by dividing the zth row of sample frequencies by y/nf, . 
F or exa mple, the entry 8.64 m the cell i = 3, j = 2 comes by dividing 419 by 
y/2352, 419 being the entry in the cell of the same indices in Table I, and 2352 
being the sum of the third row. The sums at the bottom and right-hand side 
are for checking the formation of the normal equations. The cumulations of 
squares and cross-products along the vertical give the summations required for 
the normal eqs. (22), which now appear numerically as eqs. (23). 

No, X.i X .2 X.s — 1 

1 7413 -3549 -2354 = 3197 X 10 _i 

2 4441 -544 = 2356 

3 3129 = —3222 

4 0 


( 23 ) 
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Performing the solution by any favorite procedure one will obtain 
(24) Xi = .01182 X 2 = .01490 X, 3 = .00119 

TABLE I 

A table of artificial sample frequencies, an artificial 5 percent sample of native 
white persons of native white parentage attending school, by age by state, New 
England, 1930. The adjusted frequency m,, in each cell is shown italicized 
just below the corresponding sample frequency n,, 


Age 

7 to 13 

14&15 

16 & 17 

18 to 20 




i = 

1 

2 

3 

4 

Ui 



\i = 

.0118 

.0149 

0012 

0 

7 Hi 

State 

i 

X,. 






Maine 

i 

-.0146 

3623 

781 

557 

313 

5274 




3613 

781 

550 

308 

5353 

New Hampshire 

2 

-.0003 

1570 

395 

251 

155 

2371 




1688 

401 

351 

155 

3395 

Vermont 

3 

.0234 

1553 

419 

264 

116 

2352 



, 

1608 

436 

3t0 

119 

3433 

Massachusetts 

4 

-.0162 

10538 

2455 



15859 




10493 

3463 

1680 

1141 

16766 

Rhode Island 

5 

-.0230 

1681 

353 

171 

154 

2359 




1663 

350 

167 

160 

3330 

Connecticut 

6 

-.0034 

3882 

857 

544 

339 

5622 




3915 

867 

543 

338 

— 

5663 



n , 

22847 

5260 

3493 

2237 

33837 



m., 

33877 

6385 

3462 

3313 

33837 


The adjusted nut (italicized) are rounded off, hence when summed may occasionally 
disagree a unit or so with the expected marginal totals (also italicized), the latter arise 
by deflation from the universe rather than by direct addition of the m t ,. 


whereupon by substitution into eq. (21) comes 

Xi. = -.0146 Xi. = -.0162 

(25) Xs. = -.0003 X 6 . = -.0230 

X 8 , = +-0234 X«. = -.0034. 

The next step is to compute the m„ by eq. (19). Table I is now bordered 
with the Lagrange multipliers for a convenient arrangement of the factors 
required, and the calculation is completed. It will be noted that, for example 
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(26) mas = 419(1 + .0234 + .0149) = 435. 

The m t , thus calculated are shown italicized in Table I. The marginal totals, 
found by adding the m tl - just calculated, do not agree exactly everywhere with 
the expected totals, because of rounding off to integers: the errors of closure, 
however, are slight, and it is a simple matter to raise or lower some of the larger 
cells by a unit or two to force exact satisfaction of the conditions, if this is 
desired. 

4. The three dimensional problem. .Here the N cards of the universe are 
sorted and counted for one and perhaps a second and third characteristic, and 
possibly crossed by pairs in various combinations (Cases I-VII). The sample 
of n, however, is crossed by all three characteristics, which is to say that the 

TABLE II 


This comes by dividing each sample frequency in Table I by the corresponding yf n,. 
(This operation would ordinarily be done a row at a time) 



0 = 

m,/y/ru 

Sum 


1 

2 

3 

4 

i — 1 

49.89 

10.75 

7.67 

4.31 

72.32 

144.94 

2 

32'. 24 

8.11 

5.15 


1 

97.87 

3 

32.02 

8.64 

5.44 

2.39 

50.15 

98.64 

4 

83 68 

19.49 

13.55 

9.21 

125.19 

251.12 

5 

34.61 



in 


96.54 

6 

51.77 

11.43 



75.51 

150.49 

Sum 

284.21 

65.69 

42.59 

26.78 


839.60 


cell frequencies n„ k are all known (refer to Fig. 2). As before, the adjusted 
frequencies are required. 

Case I: One set of slice totals known, Assume the slice totals 2V\, , Ni , 
• • • , N r , to be known; the conditions are then 

(27) 22 we* = m (- . = Ni ,n/N i = 1, 2, ■ • ■ r 

ih ■ 

being r in number. The summation to be minimized is 

( 28 ) £ = 2(m„t - n tl k) 2 /n xjk 

being similar to that in eq. (3), except that now there are three indices to be 
summed over instead of two. Following a procedure similar to that used before, 
we differentiate eqs. (27) and (28) and introduce the r Lagrange multipliers X*. 
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with eq. (27). The steps are identical with those of the two dimensional Case I, 
and the result is at once 

(29) fc = + X» ) = fn t .). 

This adjustment, like that shown by eq. (15), is a simple proportionate one, but 
this time by slices rather than by columns. All cell frequencies having the same 
i index are raised or lowered in the same proportion. 



Fm. 2. Showing the System of Notation foe the Cell Frequencies and Marginal 
Totals in the Three Dimensional Sample 

Case Hi Two sets of slice totals known. Here, in addition to the slice totals 
of Case I we know also 

N i ,N .••• ,N.,, 

whence arise the s — 1 additional conditions 

(30) m t]k = m,. = N ,.n/N, j = 1, 2, • • •, s - 1. 

ik 
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Using the Lagrange multiplier X ,. here, and X,- with eq. (27) as before, we 
find that 

(31) mijk = «ij/:(l + X, + X j ,) 

» 

in which X ». is to be counted zero. This adjustment is proportionate by tubes, 
the ratio to„a/w,,* being constant along the ijth tube and in fact equal to 
m,, /n,,., independent of k. Unfortunately we do not here know the face totals 
m t] . and are unable to make use of the proportionality as we shall in Case IV. 

To solve for the r + s — 1 Lagrange multipliers we sum the members of eq. 

(31) over j and then over i and arrive at the normal equations 

n { . X,, + H n„ X., = m,.. - 7i,„, i = 1, 2, ..., r, 

(32) „ 

2^ «<j X... + n.j.X.,. — m — n j = 1, 2, •.., s — 1. 
\ 

These can be-reduced to s — 1 equations in precisely the same way that eqs. 
(20) were reduced, but because of the iterative process to come further on, we 
shall not pursue the reduction here. 

Case III : All three sets of slice totals known All slice totals 

N. i., N s ., • ■ • , N... 

Ni .., N t .., - •. , N r .. 

N. A ,N.s, ■■■,N..< 

now being known, in addition to conditions (27) and (30) we require here 

(33) 2] m,,* = m..k = N. in/N, k = 1, 2, . • •, t - 1 

which makes a total of r + (s — 1) + (t — 1) or r + a + t — 2 conditions- 
The same kind of manipulation as used heretofore gives 

(34) m,,* = n,-,i(l + X<.. + X + X .*) 

with X.a. and X t to be counted zero. The adjustment is no longer propor¬ 
tionate by slices or tubes, but involves every cell. In practice, once the normal 
equations are solved and the Lagrange multipliers worked out, one proceeds 
very much as in the two dimensional Case II: for each of the t slices, corre¬ 
sponding to the t values of k, there will be a two dimensional adjustment, the 
1 in eq. (19) being replaced now by 1 + X. *. 

The normal equations for the Lagrange multipliers can be found by per- 


forming double summations on eq. (34). The result is 
n, X,. + £ n,/. X.,. + £ n, aX.,* = m,.. — n,„ , 

J k 

i = 1,2, -- 

■, r, 

(35) 2 n <] .X,-.. + n i. \ H n./*X h = m.,. — n.j. , 

* k 

7 = 1,2,.. 

•, s - 1, 

£ n, jbX,„ + 2 rc.jJtX.,. + n.. k X.. k = m.. k - n.. k , 

??- 

11 

h- 1 

K> 

•, t - 1. 


* i 
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If these calculations were to bo carried out, one would simplify the computation 
by solving the top row for X, , getting 

(36) X,-. = (1 /n<.) {m,.. - Z n„Xi. - Z - 1 

J k 

and then substituting this into the middle and last rows of eqs. (35) to get a 
reduced set of s 4- £ 2 normal equations for the Lagrange multipliers X , 

and X ,i, the numerical values of which when set back into eq (36) give the X, 

In all the summations of eqs (35) and (36), X , and X.. ( would be counted zero. 
But here again, the iterative process to be explained later will displace the use 
of normal equations, so actually we are not interested in reducing them. 

Case IV: One set of face totals known. It may be that the rs face totals 

Nil. i Nn,, ■ ■ ■ , N ,• , N„. 

are known from crossing the i and j characters in the universe. The conditions 
are then 


(37) Z m<)k = mu. - N,,.n/N 

k 

The adjustment here turns out to be 



(38) = n„jt(l + X. 3 .); 

but by summing both sides over the index k to evaluate X„. it is seen that 


(39) 

Wlyj, — MljX 1 “J” 

whence 


(40) 

muk - n tJ k(m t7 /fin). 


This adjustment is thus proportionate fay tubes, like that in eq. (31), though 
here the factor myy./n<y, is known and eq. (40) can be applied at once. 

Case V: One set of face totals, and one set of shoe totals known. Sometimes, in 
addition to the rs face totals of Case IV, the slice totals 


will also be known, in which circumstances the conditions (37) are to be accom¬ 
panied by 

(41) Z Min' = = N..k n /N , fc = 1 , 2, •••,< — 1 . 

it 

The same procedure as previously applied yields now 

(42) m,fk — + Xu. + X.,t) 

with X. ( to be counted zero. Summations performed over k, and then over i 
and j together, give the normal equations 
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n,j X,, 4" 2 j n^ k \ ,k m,,. n,,. , 

k 

(43) „ 

2_ ^X,,, + n .fcX. k = m,k- n..k . 

u 

The number of equations is rs + t — 1, since X t does not exist. As before, 
a simplification can be effected by solving the top row for X,, and making a 
substitution into the lower one, but because of the great advantage of the 
iterative process to be seen further on, we shall not carry out the reduction. 

Before going on it might be noted that although this case is three dimensional, 
it reduces to the two dimensional Case II if one considers that if. is one index 
running through the values 11, 12, • • , 21, 22, ■ ■ ■ , rs, and that . k is a second 
index running through the values 1, 2, • ■ • , t. This can be seen by the simi¬ 
larity between eqs. (43) and (20). 

Case VI : Two sets of face totals known. If in addition to the face totals of 
Case IY, the face totals 

N.n , N,n > • • ■ , N.tt 

are also known from further crossing the j and k characters in the universe, we 
shall require 


(44) £ nUjk = m, k = N., k n/N, 

' k = 1, 2, • • •, f — 1 

in addition to the conditions (37). In place of eq. (40) Of Case IY we now 
find that 

(45) m ljk = n tJ *( 1 + X t/ . + X.,*) 


in which \., t is to be counted zero for all j. No simple relation such as eq. (40) 
is possible here, because the adjustment is not proportionate by tubes; the 
Lagrange multipliers must be evaluated. This can be accomplished by summing 
the members of eq. (45) over k and i in turn, resulting in the normal equations 


(46) 


n,, Xi, d" ^ . nific\jk — Mi], n\j ., 

n tl if\ij 4" n, ,*X jk = Tn,jk n, lk , 

i 


Since X does not exist for any values of j, the number of equations is 
rs + s(t - 1) = s(r + t — 1). They break up at once into s sets each of 
r + t — 1 equations, one set for every j value. In fact, the problem can be 
considered as s sets of the two dimensional Case II. Any one value of j gives 
a slice, which can be looked upon as fulfilling the specifications of the two 
dimensional Case II. Each set of normal equations can be reduced in the same 
manner that eqs. (20) were reduced. 

Case VII : All three sets of face totals known. All totals now being known, 
we require 



A LEAST SQUARES ADJUSTMENT 


439 


(37) 

52 m„ k = m i{ , = N it .n/N, 

i = 1, 2, • • • 

J = 1,2, 

(44) 

52 m ilk = m.,i = N., k n/N, 

i = 1, 2, 



h = l,2,--- 

(47) 

52 W;y* = Wli.4 = JV, kfl/N, 



k= 1,2, 


The adjusting relation is 


(48) miik — n,-,*(l -f- X,-,- + X.,it -j- X,.it) 

m which X.,-< is to be counted zero for any j, X r k for any k, and Xi.i for an y {■ 
The normal equations for the Lagrange multipliers are 

n.j.X.,. + 52 Tu,k\ ,k + 52 Xj.jt = m,,. — n„. 

k k 

(49) 52 + n-.,j>X.,* + 53 n ,,tX ljt = ro,* — n.,* 


52 m/i:Xtj, -}- 52 v*X jAf "i - — 7Ti % .k n,* 


being rs + rf — r — s — f -f- 1 in number. They can bo reduced in the 

same way that previous normal equations have been reduced; but here again, 
the iterative process will render the use of normal equations unnecessary, except 
for theoretical purposes, e.g. justification of the iterative process. 


5. A simplified procedure — iterative proportions. It is well known m eas 
squares that the number of Lagrange multipliers in any problem is equa to t e 
number of conditions imposed on the adjustment. Here the conditions ave 
appeared in sets, depending on which marginal totals are involved. y a '™ r ‘~ 
parison of eqs. (15) and (29) on the one hand, with eqs. (19), (31), ( ’ . 1 

(45), and (48) on the other, we see that wherever there was only one ® e ° 
marginal totals involved we came out with a proportionate adjustmen , u 
that in all other cases it was not so; the Lagrange multipliers j^ake 

unfortunately related to one another through normal equations. We now ma e 
the observation, however, that as a first approximation the adjus g - 
all be considered proportionate, and we shall be able to write down ^ 6 & guc _ 

for the error in this approximation, and shall be able to eliminate i 
cession of proportionate adjustments. ^ one may 

Take the two dimensional Case II for an example. In 0 9; ^ rp^i-g 

recognize (1 /ny.) 52 n»yX./ as a weighted average of X , for the i r 

will be a weighted average of X., for the first row, another for ^ ^ 

one for each value of i; consequently one may appropriately sp e 
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average of X,,, writing it f-av. X ,■. Substituting from eq. (21) into (19) one 
then sees the adjustment (19) appear as 

(50) m,, = n„(wn /n t + X , — t-av. X ,). 

If, on the other hand, X , had been eliminated from eqs. (20), instead of X,., 
the result would have been 

(51) mi, = n„(m .,/»4 y + X, - j- av. X,,). 

From either eq. (50) or (51) it is clear why the adjustment (19) is not propor¬ 
tionate by rows or columns, and why Case II does not break up into r or s sets 
of Case I: the reason is that X , in any cell is not necessarily equal to the average 
X., for that row, nor is X,. in any cell necessarily equal to the average X, for 
that column. If nevertheless one were to make the simple proportionate 
adjustment 

(52) m'a = n<,(m, /n 4 ) 

along the horizontal in the ith row, the horizontal conditions (4) will be en¬ 
forced but not the vertical ones (5); i.e., it will be found that m[ = my , but 
that usually not all m[, = m.,. This is because eq. (52) effects only a partial 
adjustment, each m[, being in error through the disparity between theX , proper 
to the jth column, and the average of all the X,, for the ith row, as seen in 
eq. (50). This error can then be diminished by turning the process around and 
subjecting these m\j to a proportionate adjustment in the vertical according to 
the equation 

(53) m", — ,/m[j) 

which may be considered an application of eq. (51) wherein the disparity be¬ 
tween any X,- and the average X., for the jth column has been neglected. It is 
the vertical conditions that will now be found satisfied, but perhaps not all of 
the horizontal ones, because some of the row totals may have been disturbed. 
The cycle initiated by eq. (52) is therefore repeated, and the process is con¬ 
tinued until the table reproduces itself and becomes rigid with the satisfaction 
of all the conditions, both horizontal and vertical. The final results^coincide 
with the least squares solution, which is thus accomplished without' the use of 
normal equations, 

Usually two cycles suffice. In practice the work proceeds rapidly, requiring 
only about one-seventh as much time as setting up the normal equations and 
solving them. The tables III-V show the various stages of the work when 
the method of iterative proportions is applied to the sample frequencies of 
Table I. It will be noticed that the results of the third approximation (Table Y) 
are final, since if the process were continued, the table would only reproduce 
itself. 

The same process can be extended to three or more dimensions with an even 
greater relative saving in time. To see how the method of iterative proportions 
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applies in one of the three dimensional cases, we may go hack to Case III. By 
the substitution afforded through eq. (36) the adjusting eq. (34) may be put 
into the form 


TABLE III 


The method of iterative proportions applied to the data of Table I. First stage: 
A proportionate adjustment by rows by eq {52). Note thatm[ = m { , 

but that m.j m. ; 



j = 1 

2 

3 


m r 

vu 

i = 1 

3608 

778 

555 

mm 

5253 

5252 

2 

1586 

399 

254 


2396 

2395 

3 

1606 

433 

273 

■ 

2432 

2432 

4 

10476 

2441 

1696 

1153 

15766 

15766 

5 

1660 

349 

169 

152 

2330 

2330 

6 

3910 

863 

548 

341 

5662 

5662 

m., 

22846 

5263 

3495 

2235 

33839 


m.i 

22877 

5285 

3462 

2213 


33837 


TABLE IV 

A continuation of the process initiated in Table III. The figures in Table III 
are now adjusted proportionately by columns according to eq. {58). The vertical 
totals m" and m , now are equal, but the agreement of the horizontal totals 
accomplished in Table III has been slightly disturbed _ 



i- 1 

2 

3 

4 

m'j. 

m*. 

t = l 

3613 

781 

550 

309 

5253 

5252 

2 

1588 

401 

252 

155 

2396 

2395 

3 

1608 

435 

270 

119 

2432 

2432 

4 

10490 

2451 

1680 

1142 

15763 

15766 

5 

1662 

350 

167 

151 

2330 

2330 

6 

3915 

867 

543 

338 

5663 

5662 

It 

m,j 


5285 

3462 

2214 

33837 


m.j 

22877 

5285 

3462 

2213 fi 


33837 


‘ (54) Mat, = nijkimi. /n., + X,* + X„* - *- av - X >■ J ' av ‘ X -*)’ 


Equally well it could have been written 

(55) m ijk = n i]k (m., + X,.. + X * ~ i’ av - x - “ ^ &y ' X ■*>' 


or 
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(56) m ijk = n„ k {m,. k /n , k + + X.,. - fc-av. X,.. - fc-av. X 

Any of these three equations shows why the adjustment (34) is not propor¬ 
tional by slices, and why this case does not break up into r or s or t sets of the 
three dimensional Case I As a first approximation it does, as is now clear 
from these three equations, and by making successive proportionate adjust¬ 
ments we may thus arrive at the least squares values. To go about the work 
we could first calculate the values of 


(57) 

m'ijk = n <ik {m t ../n t .) 

then 


(58) 

mj jk - m'i,k(m , fm } , 


TABLE V 

The cycle is commenced again. The figures of Table IV are subjected to a propor¬ 
tionate adjustment by rows, according to eg. {52). And since these results turn 
out to be almost a reproduction of Table IV but with both horizontal and vertical 
conditions satisfied, they are considered final. The agreement with the m,, in 
Table I should be noted 



wm 


3 

4 

mi. 

TTli, 

i = 1 

3612 

781 

550 

309 

5252 

5252 

2 

1587 

401 

252 

155 

2395 

2395 

3 

1608 

435 

270 

119 

2432 

2432 

4 

10492 

2451 

1680 

1142 

15765 

15766 

' 5 

1662 

350 

167 

151 

2330 

2330 

6 

3914 

867 

543 

338 

5662 

5662 

m[, 

22875 

5285 


mm 

33836 


171, j 

22877 

5285 


mm 


33837 


followed by 

(59) m"'ik = mi, k {m. k /m" k ). 

These three successive adjustments would constitute a cycle, which would then 
be repeated in whole or in part until the table becomes rigid with the satis¬ 
faction of all three sets of Conditions, 

6. Simplification when only one cell requires adjustment. On occasions it 
happens in sampling work that one is especially interested in one particular cell 
of the universe, and would like to have a result for it in advance before the other 
cells are adjusted. Sometimes it even happens that the others individually 
are of no particular concern. In such circumstances one merely places the cell 
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of interest in one corner of the table by an appropriate interchange of rows and 
columriSj and then compresses the rest of the table into the cells adjacent to it 
In the two dimensional Case II one would thus work with a 2 X 2 table, one 
corner cell being the one of special interest, the other three being the result of 
compression The marginal totals of the row and column belonging to the cell 
of interest are unaffected. For illustration we may suppose that from the 
sample shown in Table I we require only m 61 . We then start with the 2 X 2 
Table VI, which is derived from Table I by compression. Commencing with 
Table VI, one might first adjust by rows according to eq. (52), then by columns 
by eq. (53), One cycle of iterative proportions is sufficient, as is seen in Table 

TABLE VI 


Derived from Table I by compression, the cell i = 6, j = 1, requiring adjustment 



j - l 

J - 2-4 

*u, 

_ 

m ,. 

i = 1-5 

18965 

9250 



i - 6 

3882 

. .... . 

1740 

5622 

5662 

n.i 

22847 

10990 

33837 


m.j 

22877 

10960 


33837 


TABLE VII 

A proportionate adjustment of Table VI 
Rows adjusted by eq. (52) Columns adjusted by eq. (53) 


18938 

9237 

28175 

18962 

9213 

28175 , 

3910 

1752 

5662 

3915 

1747 

5662 

22848 

10989 

33837 

22877 

10960 

33837 


Conclusion; = 3915 


VII, and the value 3915 found for mn is in good agreement with its value shown 
in Tables I and V. The scheme of compression provides a quick, method of 
getting out an advance adjustment for a cell of special interest, and the result 
so obtained will ordinarily be in good agreement with what comes later when 
and if all the colls are adjusted. 

In the three dimensional Cases II, III, V, VI, and VII, one compresses the 
original table to a 2 X 2 X 2 table, and then uses the method of iterative propor¬ 
tions, (The other cases do not require consideration, since they are propor¬ 
tionate adjustments wherein one is already at liberty to adjust as few or as 
many cells as he likes without altering the equations or the routine.) The same 
procedure can be extended to the adjustment of two cells, the only modification 
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being that in two dimensions we shall compress toa2X3ora3X3 table 
depending on whether the two cells do or do not lie in the same row or column. 
In three dimensions we compress to a 2 X 2 X 3, ora2X3X3,ora3X3X3 
table; the first if the two cells lie in the same i, j, or k tube, the second if they 
lie in the same slice but not in the same tube, the third if they are in separate 
slices. 

7, Some remarks on the accuracy of an adjustment. A least squares adjust¬ 
ment of sampling results must be regarded as a systematic procedure for 
obtaining satisfaction of the conditions imposed, and at the same time effecting 
an improvement of the data in the sense of obtaining results of smaller variance 
than the sample itself, under ideal conditions of sampling from a stable universe. 
It must not be supposed that any or all of the adjusted m,, in any table are 
necessarily “closer to the truth” than the corresponding sampling frequencies 
n t j , even under ideal conditions. As for the standard errors of the adjusted 
results, they can easily be estimated for the ideal case by making use of the 
calculated chi-square, For predictive purposes, however (which can be regarded 
as the only possible use of a census by any method, sample or complete), it is 
far preferable, in fact necessary, to get some idea of the errors of sampling by 
actual trial, such as by a comparison of the sampling results with the universe, 
as can often be arranged by means of controls. There is another aspect to the 
problem of error—even a 100 per cent count, even though strictly accurate, is 
not by itself useful for prediction, except so far as we can assert on other grounds 
what secular changes are taking place, 

In conclusion it is a pleasure to record our appreciation of the assistance of 
Miss Irma D. Friedman and Mr. Wilson H. Grabill for putting the formulas 
and procedure into actual operation with census data, and thereby disclosing 
defects in earlier drafts of the manuscript, 

Bubeau op the Census, 

Washington 



NOTES 

This section is devoted to brief research and expository articles , notes on methodology 
and other short items. 


THE STANDARD ERRORS OF THE GEOMETRIC AND HARMONIC 
MEANS AND THEIR APPLICATION TO INDEX NUMBERS 1 

By Nilan Norris 

Attempts to derive useful expressions for estimating the standard deviations 
of the sampling errors of the geometric and harmonic meanB have not yielded 
results comparable with those afforded by the modern theory of estimation, 
including fiducial inference. There are in the literature of probability theory 
certain theorems which can be applied to obtain these desired results in a 
straightforward manner. The use of forms for estimating standard errors is 
subject to certain conditions which are not always fulfilled, particularly in the 
case of time series. An understanding of these limitations should deter those 
who may be tempted to judge the significance of phenomena such as price 
changes solely on the basis of estimated standard errors of indexes. 

1. Statement of formulas. The standard error of the geometric mean of a 
sequence of positive independent chance variables denoted by x, =* a*, f ... f 

x n , is <ro = 0i ^ 7 =r, where 0i is the population geometric mean of the variates; 

V n 

so that ciog * is the standard deviation of the logarithms in the population as 
given by cri og * = [i?{[log x — Z7(log x)] 4 )] 1 ; and n is the number of individuals 
comprising the sample. The estimate of the standard error of the geometric 

mean is s a = G — , , where G is the sample geometric mean, that is, the 

V»- l 

estimate of 0i; so that si 0l *, is the estimate of <n oe x ; and n - 1 is the degree of 
freedom of the sample. 

1 This article summarizes two papers presented at sessions of the Institute of Mathe¬ 
matical Statistics at Detroit, Michigan on December 27, 1938, and at Philadelphia, Penn¬ 
sylvania on December 27, 1939. The results given herein can be derived by several meth¬ 
ods, which vary somewhat as to degree of rigor. The writer wishes to acknowledge his 
indebtedness to the referee for suggesting a proof based on a probability theorem stated 
by J L. Doob, "The limiting distributions of certain statistics," Annals of Math. Stat., 
Vol 4 (1935), pp 160-169. The standard deviation formulas obtained follow as an applica¬ 
tion of this theorem, as will be seen by reference to it. Obviously the asymptotic variance 
formulas of many other statistics (estimates of parameters) can be obtained in a similar 
manner. 
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The standard error of the harmonic mean of a sequence of positive inde¬ 
pendent chance variables denoted by x, = Xi, x 2 , ■ • • ,x n , is <r E = e\ 

Vn 

where the population harmonic mean of the variates is 0 2 = 1/a = [E(l/x')}~ 1 - 
so that the standard deviation of 1/x in the population is a i/x = [E{[\/x — 
and n is the number of observations comprising the sample. The 

estimate of the standard error of the harmonic mean is s n = ~ , where 

a Vn - 1 

the estimate of a is given by a = ^ = - (2 1/a:,); in which s i/,, is the standard 

tl 71 

deviation of the reciprocals of the observations comprising the sample; and 
n — 1 is the degree of freedom of the sample. 


2. Derivation of formulas. These forms can be obtained by application of 
the Laplace-Liapounoff theorem 2 as follows * Let x, = x l , x 2 , • •. , x n be a set of 
positive independent chance variables with the same distribution functions, 
where the expectations, E(x ,) and E(x\) exist, and where at = E {[a:,- — l?(x,)] 2 J 
> 0. The last condition is imposed to eliminate the trivial case in which the a, 
are all equal and their distribution is confined to a single point. The geometric 
mean of the x, is G = (xi• x 2 ■ ■ ■ ■' x n ) 1/n , and the harmonic mean of the is 


H 


= "-e-T- 

_n 


It is necessary to 'assume that both <r\ os x and a\ /x are finite, and that in the 
case of both log’x and 1/x at least one moment of order higher than any two of the 
respective vanates is also finite. The requirement that the variance and at 
least one moment higher than the variance be finite can be weakened in various 
ways, but this is a trivial consideration, since nearly all distributions of any 
importance have finite third moments. 3 Certain rarely occurring types of 
distributions, such as the Cauchy distribution, have infinite variance. In such 
cases, standard error formulas as ordinarily used are not valid. 

Let E(\og x) = f, and E( 1/x) = a, By the Laplace-Liapounoff theorem, 


except for terms of order l/\/n, the limiting distributions of ^ - l) 


and 


VniH- 1 - a) 


<Hog s 

are normal with zero arithmetic means and unit variances. 


< 71/3 

That is, if C represents a set of conditions on chance variables, and P{C) is the 
probability that these conditions are satisfied, then 


1 A. Khintchme, Asymptotiache Gesetze der Wahrscheinlichkeitsrechnung, Ergebnisse 
der Mathematik und ikrer Grenzgebiete, J Springer, Berlin, 1933, Vol, II, No 4, pp. 1-8; 
J. L. Doob, op. tit., pp. 160-169, and S S. Wilks, Statistical Inference, 1936-1937, Edwards 
Brothers, Inc , Ann Arbor, 1937, pp. 39 /. 

s For a more detailed discussion of this matter see Wilks, op. cit., pp 39 /. 
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ito f/ vgSa ?-» <; 

n-»oo tflog * 


lim P 

n-*oo 




l 

V2ir 



»» 

e 2 dx. 


In order to use these relations in obtaining the limiting distributions of the 
geometric and harmonic means, it is necessary to suppose that the sequence of 
random chance variables, V ,, converges in probability (converges stochasti¬ 
cally) to p, and that the sequence of random chance variables, \/n(V, — p ), has 
a normal limiting distribution with zero arithmetic mean and variance a 
Also, it is necessary to assume that the real-valued function, /(x), has a Taylor 
expansion valid in the neighborhood of p. Iff(p) * 0, only the first two terms 
of the series are needed. The required expansion is given by 


fix) = f(p) + (x- p)f( p ) + — 2 -^/"[p + 0( X - P )], 


whereO < < 1, and/" (x) is continuous in the neighborhood of p. When these 

conditions are fulfilled, the limiting distribution of Vw[/(F ( ) - /(p)] is normal 
with an arithmetic mean of zero and a variance of <r 2 [/'(p)f. 

Let/(log (?) = e‘°* and use the expansion given by e‘ 01 0 = e f + (log G — f)e r 
d - d(l°g 0 1") * f) . Since = e^, it follows that the limiting 1 distribu¬ 

tion of -\/n(G — 0i) is normal with an arithmetic mean of zero and a variance of 

log X ■ 

Similarly, it can be shown that the limiting distribution of \/n(H - 0 2 ) is 
normal with an arithmetic mean of zero and a variance of , where 6 2 = 

i = IE( l/x)]~\ 

It is of some interest to observe that the expressions for the standard errors 
of the geometric and harmonic means correspond with the forms previously 
given for the standard errors of two efficient ratio-measures of relative variation, 4 
namely, 

_ *1 ^ «* 
vq/a — m ga/q , ana a bio = vain , 

where 8 X / 6 is the population geometric-arithmetic ratio, and 0 2 / 8 X is the popula¬ 
tion harmonic-geometric ratio. 


3. Limitations of standard-error estimates. Application of these forms is 
subject to the usual conditions for drawing sound inferences on the basis of (he 
representative method. Fiducial argument should be employed to avoid certain 
untenable assumptions of the outmoded method of using standard errors. 
Estimates of the standard deviations of sampling errors do not constitute an 
ultimate test of significance which can be applied with a high degree of success 
to all types of problems. In general, such estimates cannot be relied upon with a 


4 Nilan Norris, “Some efficient measures of relative dispersion," Annals of Math. Stut., 
Vol. 9 (1038), pp. 214-220. 
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high degree of confidence when they are used as tests of significance for index 
numbers, since in nearly all time series there exists an appreciable degree of 
serial correlation, persistence, or lack of independence among successive items of 
any sample. 

4. Bibliographical note. Certain aspects of the sampling distribution of the 
geometric mean have been discussed by Burton H. Camp. 6 Attempts to derive 
forms for estimating the standard errors of index numbers have been made by 
Truman L. Kelley 8 and Irving Fisher/ and an empirical study of the sampling 
fluctuations of indexes has been made by E. C. Rhodes. 8 Although various 
special tests of significance for time series have been proposed/ at the present 
time no generally satisfactory procedure has appeared. 

Hunter College, 

New Yohk, N. Y. 


‘ Burton H. Camp, "Notes on the distribution of the geometric mean,” Annals of Math. 
Stat., Vol. 9 (1938), pp. 221-226. 

• Truman L. Kelley, "Certain Properties of Index Numbers,” Quarterly Publications of 
Am. Stat. Assn., Yol. 17, New Series 135, Sept, 1921, pp. 826-841. 

7 Irving Fisher, The Making of Index Numbers, Houghton Mifflin Company, New York, 
1927, 3d ed , pp. 225-229, 342-345, and Appendix I, pp. 407 and 430 /. 

•EC. Rhodes, “The precision of index numbers,” Roy. Slat. Soc. Jour., Vol. 99 (1936), 
Part I, pp. 142-146, and Part II, pp. 367-369. 

• Some of the more recent papers dealing with this matter are; G Tintner, “On testB of 
significance in time senes,” Annals of Math. Stat., Vol. 10 (1939), pp. 139-143; “The analysis 
of economic time aeries," Am. Stat. Assn Jour., Yol. 35 (1940), pp. 93-100; L. R. Hafstad, 
“On the Bartels technique for time-series analysis, and its relation to the analysis of 
variance,” Am. Stat Assn. Jour., Vol. 35 (1940), pp. 347-361, and Lila F. Knudsen, “Inter¬ 
dependence in a series,” Am. Stat. Assn. Jour., Vol. 35 (1940), pp. 507-514. 


A NOTE ON THE USE OF A PEARSON TYPE HI FUNCTION IN 

RENEWAL THEORY 

By A. W. Brown 

One of the methods suggested by A. J. Lotka 1 for the derivation of the renewal 
function may be briefly summarized as follows. 

The method consists of dissecting the total renewal function into "genera¬ 
tions”. The original installation constitutes the zero generation, the units 
introduced to replace disused units of the zero generation constitute the first 
generation, renewal of these the second, and so on. Let f(x) be the "mortality” 
function, the same for all generations, fix) is a function satisfying the usual 
conditions of a distribution function. Adopting Lotka’s notation, let N be the 
number of units in the original collection, Bi(t) dt the number of objects intro- 

1 A, J Lotka, “A Contribution to the Theory of Self Renewing Aggregates, With Special 
Reference to Industrial Replacement,” Annals of Math. Stat , Vol. 10 (1939), p. 1. 
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duced between times 1 and t + it and belonging to the first generation, BM it 

a similar expression for the second generation, etc Bi(t)/N BM)/N mav 

be regarded as renewal density functions for the various generations 
Now, evidently, 


( 1 ) 

( 2 ) 

and in general 
(3) 


£i(0 = Nf(l) 

Bt(t) = f Bx(t - %)f(x) dx 

JO 


-Bj+i (0 = f B f (t — x)f(x) dx. 

Jo 


Summation of the contributions of the successive generations gives for the total 
renewal at the time t 


(4) 


Bit) — B 1 (t) -f- J B(t — x)f{x) dx. 


In this note we propose to use a Pearson Type III function for/(x) and observe 
what form. our equations then assume. The Pearson Type III function 

x ~ e“", (c > 0, k > 0), appears to be a reasonable one to use in many 

practical situations. The two parameters c and k give it a considerable amount 
of flexibility. The fact that this function has an unlimited range in one direc¬ 
tion is relatively unimportant from a practical point of view, as is well known 
from the experience of fitting curves of this type to skewed data with limited 
range. Of course the question of whether a Type III curve is appropriate can 
be answered more objectively by using the usual Pearson curve-fitting criteria, 
, 0i and k. We have, then, substituting in (1) 






( 5 ) 

and from (2) 

(8) B t (i) 

(7) ~ mm 

If, now, we set x = ty, the integral in (7) reduces to 


1 dx 


Nc* 


““ f it- x) 

Jfl 


*-V _1 dx. 
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Hence, 

( 8 ) 

and in general 

(9) 


2k 

Biit) = N m fk ~ le ' ct 


B,(t) = N JL t ,k ~\ hcl 


r m 


S ummin g the contributions of the several generations, we have for the total 
renewal function 


( 10 ) 




If k is a positive integer > 3, (10) can be easily summed to a form which 
shows immediately its damped periodic nature. Even if k is positive but not 
an integer, it can be shown by continuity considerations that the function B(t) 
defined by (10) has periodic properties. 

Assuming A; to be a positive integer, then, and setting z — ct, we may write 
the expression in brackets in (10) as 




2k -1 


( 11 ) 

Then 


+ 


(k- 1 )! 1 ( 2 k - 1)1 

d k f(z) 


+ • ■ ■ = }{z). 


dz k 


= /(*) 


and upon making the trial substitution, f(z) = Ae m \ we get 

Am k e m ‘ =5 Ae mi . 

Hence, 

m k = 1. 

Taking unity in its complex form 

1 = cos 2nw + i sin 2mr 

we have that 

( 12 ) 

where n = 0, 1, 2, 


ft /"T 2T17T . . » 

m n = v 1 = cos -r- t ism 
/c 

f k — 1. Then 

k*-l 

f(z) = E A^’ 


2mtt 


k —1 

f(z) = E A n m 1 n e m "‘. 

Tl—O 


and 
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Now setting 2 = 0, we get 

m = Ab + Ai -f- • • • A1-1 = 0 

f'i 0) = A 0 ma + Aiwii + • • • + Ak-imk-i = 0 


/* X (0) — AaVio 1 + Aim]. 1 + • • • + Ak-iml-i = 1 

k equations to determine the k constants. We know that A„ is equal to the 
ratio of two determinants formed from the coefficients of the above equations. 
This ratio reduces to 

^ An (m k -1 - — »„) • * • (m n - m 0 ) ' 

We have, then, an expression for the k constants in terms of the k roots of unity. 
Therefore, for any particular value of k we can obtain the sum of our series 
from the relation 

/(*) = £ A n e”"‘. 

n -0 

Hence, under the assumption that A; is a positive integer, we have 

*-1 

(14) Bit ) - Nce~ cl £ A„e m * c '. 

T »—0 

The forms of B(t ) for k = 1, 2, 3, 4 are respectively 
Bit) = Nc 

B{t) = iNc( 1 - e~ 2c< ) 

B(t ) = fVce" c '^e c ' - <T }e ' ^ cos *\/3ct + sin 


Bit) = Nce~ ct [W‘ - e e< ) - £ sin ct]. 

Although the above procedure is valuable particularly because it brings to 
light something of the nature of our renewal function, the forms derived above 
can be used actually to obtain values of Bit) for various values of f. However, 
for extensive numerical work a better method is at hand, which does not even 
depend on the assumption of an integral value for k. . , , , 

Let us return once again to equation (10) which may be written m the fol¬ 
lowing form 


Bit) = Nc 


-“(elf' 1 

m 
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If k and c are determined by the method of mpments, (using two moments), 
k will not, in general, be a positive integer. However, by using the Tables of 
the Incomplete Gamma Function edited by Karl Pearson, one can compute values 
of B{t) without much difficulty. In these tables the function 7(«, p) is tabulated 
for various values of u and p, where I{u, p ) is defined by 


(16) 


7(u, p) = 


I 


«y p-t-i 


e V v v dv 


r (p + i) ' 

If we let £ = WiV:p + 1 = Vp then upon integrating by parts we find 

e“ f £ p 


(17) 


r (P + 1) 


= 7(wo, p - 1) - 7(«i, p). 


The left hand member of this equation is of the same form as each of the terms 
of the series in brackets in (15). Hence, the value of the renewal function for a 
particular time, t, is directly obtainable by su mm ation of the right hand member 
of (17) for successive significant values of the argument p. 

By way of illustration a numerical example will be considered. The data are 
taken from E. B. Kurtz’ book entitled Life Expectancy of Physical Property. 
In this book the author makes a study of retirement rates of fifty-two different 
types of physical property, and finds that their replacement curves fall into seven 
distinct groups. We consider here Group VII which happens to be the largest 
group, embracing seventeen different types of industrial equipment out of the 
fifty-two examined. Using Kurtz’ replacement data 2 we obtain for the value 
of the first and second moments 


pi = 10.002 

p, = 121.71 

and from these by the method of moments, we find 

k = 4.62 
c = .462. 

We then proceed to calculate values of B(f) /N by means of Pearson’s Tables,” ob¬ 
taining the results shown in the following table. 


* E-B. Kurtz, Life Expectancy of Physical Property, Ronald Press, 1930, Table 22, page 86. 

* With regard to the method of interpolation employed in the calculations, it should 
be mentioned that it was found advisable to use the Mid-panel Central Difference Formula 
(xxiii) on page xii of the introduction to Pearson's Tables; and that it is quite sufficient 
for our purposes to calculate only firBt order terms. 
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t 

B(t)/N 

t 

B(t)/N 

0 

.0000 

10 

.1049 

1 

.0016 

11 

.1043 

2 

.0103 

12 

.1028 

3 

.0279 

13 

.1006 

4 

.0486 

14 

.0990 

5 

.0714 

15 

.0994 

6 

.0867 

16 

.1009 

7 

.0980 

17 

.1013 

8 

.1039 

18 

.0992 

9 

.1066 

19 

.0999 



20 

.0993 


In conclusion the author wishes to thank Professor S. S. Wilks for various 
suggestions he has made in connection with this note. 

Princeton University, 

Princeton, N. J. 


estimates of parameters by means of least squares 

By Evan Johnson, Jb. 

As a criterion for comparing estimates of a parameter of a universe, of known 
type of distribution, the use of the principle of least squares is suggested. A 
criterion may be stated in rather general terms. Its application to any given 
problem presumes a knowledge of the distribution functions of the estimates 
considered. In the present paper a criterion is set up and application of it is 
made in the estimation of the mean and of the square of standard deviation of a 

normal universe. , 

We shall use the symbol 0 to represent a parameter to be estimated. It is 
to be remembered that 0 is a constant throughout any problem, that it represents 
an unknown value, and that observations and functions of observations (called 
estimates) are the only variables that occur. We shall use the symbols x ,, * - 
1,2, • • ■ , n, to represent observed values of the variable a; of the universe, and 
the symbol F to represent a given function of the observations x<. 

If we choose to consider a given function F as an estimate of 0, we are then 
interested in the error F — 6. This quantity differs from the so-called residual 
of least square theory, since we are here interested in the difference between 
computed and true values, rather than in the difference between observed and 
computed values. To avoid any possible confusion we shall refer to t 
as the error. Over the set of all samples of n observations, z,, the distnbution 
of the errors F - 0 is expressed by means of the,distnbution function f( ), 
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which may bo computed from the known distribution function of the universe. 
We shall assume that the function /(F) has been normalized, so that I f(F ) dF = 

" a 

1, where the interval from a to includes all possible values of F. The integral 
rt > 

I = / (F — 8) 2 f(F) dF, associated with a given estimate F, may be thought 

* a 

of as the average square error over the set of all samples. 

In this notation we shall state a criterion for the judgment of estimates in 
either of the two following forms: 

Definition 1. Let fi be the distribution function of F\, and fa that of F a . 
The estimate Fi of 6 will be judged better than the estimate F t if 


rP rP 

J (x — dffiix) dx < J (x — 8)%(x) dx. 


Definition 2. From a given class of functions, of which F is a member, F will 
be called the best estimate if 


( 1 ) 


I = f 3 (F - 9) 2 f(F) dF 

J a 


is less than the corresponding integral for all other functions of the class. 

It is to be observed that the integral I is a function of the quantities 8 and f. 
From this is seen at once the distinction between the present problem of mini¬ 
mizing the average square error and the similar problem of finding that point 
around which the mean square value of the deviations of a variable is a minimum 
In the problem under consideration we wish to find the function F, or more 
precisely its distribution function /(F), for which I takes its minimum with a 
fixed value of 6. In the alternative problem we have a given distribution / 
and we wish to find the minimum of I with respect to 8. 

A second observation to be made is that the integral I can not be usefully 
minimized in the sense of the general conditions of the calculus of variations. 
The problem would be of the isoperimetric variety, with the side condition 

I fix) dx = 1. A solution might be expressed as the limit, as a approaches zero, 

•'a j 

of functions/(i) with proper continuity conditions, such that 


/(») 


= 0 when | x — 6 | ^ a, 

f(x) dx = 1. 

—a 


Such a solution would be meaningless in practical statistical theory. Solutions 
are to be expected, therefore, only ill those cases where the class of functions, 
from which F is to be selected, is sufficiently restricted. 

The two following examples illustrate both restrictions and possible applica¬ 
tion of the theory. 
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As a first example let us consider the problem of finding an estimate F of the 
mean, x, of a normal universe. The mean of a distribution is a symmetric 
linear function of the variates of the distribution. For the class of functions 
from which to select an estimate F of x, let us take the class of all symmetric 
homogeneous linear functions of the observations x t . Let 

(2) F = a(xi + + ... + Xn ). 

We wish to find the value, of a, if any, for which 7 is a minimum. 

F is the sum of n normally distributed independent variables, ax ,, each with 
standard deviation aa. F, therefore, has a distribution function 


/ = Cb exp 


/ — (F — anxf \ 
\ 2 a? no* /’ 


where C is so 


chosen that 



1 . 


A discussion of general distribution func¬ 


tions may be found in Dunham Jackson’s article, “Theory of Small Samples,” 
in the American Mathematical Monthly, Volume XLJI, 1935. In this case it 
can be shown without particular difficulty that 


= a J no- 2 -f- 5? (an — l) 2 . 


To determine the minimum of 7 with respect to a, we set 
~ — 2 ana 1 -f 2x s (an — l)n = 0, 


and obtain 


(3) 


£ 2 _ 1 1 
ni a + c 1 n 1 + S/nx 2 



It is seen that for evert such a simple example as the estimation of the mean 
there is no estimate of the form of equation (2), with a independent of the param¬ 
eter to be estimated, for which I takes its minimum value. 

For a distribution in which J ^ 0, and a 2 /nx 2 is small, a is given as a first 
approximation by 1/n. The function F is merely the mean of the sample obser¬ 
vations. If x — 0, the required solution is a = 0, and there is no best least 
square estimate of the type of equation (2). 

In the case where a 2 /* 2 is not small, as is apt to be the case when x is near 
zero, the determination of a desirable estimate by least squares requires a knowl¬ 
edge of the ratio a /x, which may perhaps be ]udged approximately in a special 
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problem. If this value is assumed known, the required value of a may be found 
most easily by rewriting equation (3) in the form 


(4) 


_ 1 __ 

n + a l jx l ' 


The second example to be considered is the determination of an estimate of 
«r 2 of a normal universe. A comparison with the definition of cr 2 suggests the 
use of a function F given by the equation 

(5) F = a { (a* — xf -f- (x 2 - g) 2 + • ■ + ( x n - z) 2 }, 


where x is the mean of the n observations. The value of a is, of course, to be 
determined by minimizing the integral 7 
F is the sum of the squares of n normally distributed but not independent 
variables. It may be shown, however, (Jackson, loc. czt.) to be expressible as 
the sum of the squares of n— 1 independent normally distributed variables, each 
with standard deviation -s/ac. The distribution function for F takes the form 

(6) f(F) = C (iO (n ~ 3>/ V f,/W: ', 


F taking only positive values, and G is again chosen to normalize f{F). 
integral 7 may be written 


7 = C [ (F - ff 1 ) ^ (7’) ln - a >' ^ < ^ F;5<, ' , 
Jo 


dF. 


The 


The integration is most easily accomplished by replacing F by u 2 , and in terms 
of u 

I = C' T (u 1 - <r 2 )V- 2 e- u ’ /W du. 

Jo 

The various steps in the integration will differ for even and odd values of n, 
but in each case the final result is the same It is found that 

(7) 7 = a- 4 { fl 2 (n 2 — 1) — 2a(n — 1) + 1 }. 

The value of a which minimizes I is determined from the relation 


= <r* {2o(w 2 — 1) — 2 (to — 1)} =0. 

Dividing by (n— 1), which is not zero in a sample of two or more observations, 
we obtain 




1 

7i + r 


In contrast to the previous example we have here an absolute minimum of 7 
with respect to all estimates of the type of equation (5), The best least square 
estimate of this type is, therefore, 


p _ (»i ~ £) 2 + (xj - x) 1 + • .. + (x„ - x)‘ 
n + 1 


( 9 ) 


Pennsylvania State College, 
State College, Pa. 



THE TEACHING OF STATISTICS 1 

By Harold Hotelling 

The very great increase in the teaching of statistics since the First World 
War has been associated on one hand with the development of statistical theory. 
This important series of discoveries has made available more and more power- 
ful and accurate statistical methods, and has also acquired an intellectual 
interest of its own as embodying the modern version of the most important 
part of inductive logic and as providing scope for mathematical and logical 
ingenuity of high order. The increased teaching of statistics has also been 
associated with the rapidly growing applications of statistics in innumerable 
fields, made possible by the development of the theory, by the availability of 
persons having some knowledge of the theory, and by an increasing realization 
of the possibilities of application. Doubtless most students of statistics enter 
upon the subject, not for its intrinsic interest, but with the idea of applying 
statistical methods as a tool to some particular end. This object may be 
scientific research, or to fulfill a requirement for a degree, but is often connected 
with some purely practical pursuit offering the ready prospect of a remunerative 
job. But it would be a mistake to ignore those whose interest is more purely 
intellectual, who desire an insight into the pecuhar problems of probable in¬ 
ference and the structure of empirical knowledge, who wish to get a fundamental 
acquaintance with one of the most fundamental of subjects, to see and under¬ 
stand fully the mathematical derivations underlying so much practical and 
scientific activity, and perhaps to make their own contributions. 

Of the magnitude of the demand for statisticians there can be no doubt. 
The realization of what statistical methods can do in a multitude of fields has 
gradually led the administrators of government agencies, directors of scientific 
organizations and research institutes, and business men, to employ rapidly 
increasing numbers of persons with some knowledge of statistical methods, and 
to accord an unusual degree of recognition and promotion in many such cases. 
The uses of statistical methods, and especially of sampling theory, are so varied 
that it is scarcely possible in a brief space to give any sort of survey of them. 
They enter, in one form or another, into the research work of the physicist, the 
chemist, the astronomer, the biologist, the psychologist, the anthropologist, 
the medical investigator, the economist, and the sociologist. Meteorology, 
which has lately acquired greatly increased importance, both civil and military, 
is with its masses of numerical observations very much a statistical matter. 
The engineer needs modern statistical methods both in the physical and in the 

1 Address at the meeting of the Institute of Mathematical Statistics at Hanover, N, H , 
September 10, 1940. 
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economic aspects of his plans. The work of W, A. Shewhart has made clear 
the central importance of sampling theory in the economic control of quality 
of manufactured articles. Business men who use sampling surveys to test 
the markets for their products and the effectiveness of their advertising, who 
employ statisticians to make up index numbers and forecasts of business condi¬ 
tions, and whose manufacturing costs and quality are controlled with the 
help of recently devised statistical methods, are finding more and more uses for 
statisticians. Indeed, it seems as if the exploitation of the business and manu¬ 
facturing possibilities of statistical methods has only begun, and that limitless 
further fields are coming into view. Insurance has of course always been essen¬ 
tially dependent on statistics. 

But the most rapidly growing large class of positions for statisticians is at 
present in governmental activities. For some facts regarding the employment 
of statisticians by the federal government I am indebted to Dr. J. M Thomp¬ 
son. It appears that it has about one hundred agencies using statistics, with 
almost eight hundred positions broadly classified as statistical or mathematical, 
in addition to more than six thousand generally classified as economists. The 
title "economist” covers many types of work, but much of it is largely statis¬ 
tical. The nature of the government’s statistical work is varied and extensive. 
It includes such work as forecasting revenue from taxes, prices and production 
of agricultural commodities, general demand conditions, and weather. Some 
of the work consists in. analyzing the effects of various taxes on other programs. 
In connection with proposed legislation, statisticians serving the lawmakers 
often attempt to outline the probable results of the legislation, as well as to 
assist in setting up definite formulae for carrying out the general policies aimed 
at in Acts of Congress. Administrators as well as lawmakers require statistical 
activities of a high order, exemplified in the Bureau of the Census, the Bureau 
of Agricultural Economics, and others. The scientific activities of the govern¬ 
ment, the work of the War Department, and many others that do not at first 
sight appear at all statistical, require the services of mathematical statisticians 
of high order. Even the judicial activities call for statistical theory of some 
of the most recently discovered kinds, as for instance in the investigation re¬ 
cently made of parole procedures. Cities and states, school and port authori¬ 
ties, employ numerous statisticians for other and widely diverse purposes. 

The growing need, demand and opportunity have confronted the educational 
system of the country with a series of problems regarding the teaching of statis¬ 
tics. Should statistics be taught in the department of agriculture, anthro- 
pology, astronomy, biology, business, economics, education, engineering, 
medicine, physics, political science, psychology, or sociology, or in all these 
departments? Should its teaching be entrusted to the department of mathe¬ 
matics, or to a separate department of statistics, and in either of these cases 
should other departments be prohibited from offering duplicating courses in 
statistics, as they are often inclined to do*** To what students, and at what 
stage of their advancement, should a course in statistics be administered? 
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Should there be mathematical or other prerequisites? How much of an in¬ 
vestment in a statistical laboratory is warranted? Should courses be primarily 
theoretical and mathematical, or should they be made as practical as possible 
equipping the student in the shortest possible time for a job as statistician or 
for statistical work in the field- with which a particular department is con¬ 
cerned? What about degrees in statistics? Eclipsing all these m importance, 
though it seems to have received too little of the attention of college and uni¬ 
versity administrative officers is the question, What sort of persons should be 
appointed to teach statistics? 

To pressing practical problems answers are sure to be given either by con¬ 
sidered policy or by processes of historical evolution. The latter are the more 
prominent in explaining the statistical teaching we have had A synoptic 
picture of the origins, not many decades ago, of a good deal of it would perhaps 
be something like this. A university Department of X, where X stands for 
economics, psychology, or any one of numerous other fields, begins to note 
toward the end of the pre-statistical era that some of the outstanding work 
in its field involves statistics. The quantity and importance of such work are 
observed to increase, while at the same time its intelligibility seems to diminish. 
Evidently students turned out with degrees in the field of X who do not know 
something about statistics are going to be handicapped, and are not likely to 
reflect credit on Alma Mater. The department therefore resolves that its 
students must acquire at least an elementary knowledge of the fundamentals 
of statistics To implement this principle, it perhaps inserts some acquaint¬ 
ance with statistics among the requirements for a degree This situation 
naturally calls for the introduction of a course in statistics Accordingly the 
head of the Department of X, in preparing the next Announcement of Courses, 
writes. 

“X 82. Elements of Statistics An elementary but thorough 
course designed to acquaint students of X with the fundamental con¬ 
cepts of statistics and their applications m the field of X. The view¬ 
point will be practical throughout. Second semester, MWF at 10. 

"Instructor to be announced.” 

The problem now arises of finding someone to teach the new course. The 
few well-known statisticians in the country have positions elsewhere from which 
it would be impossible to dislodge them with the bait to be offered; for though 
the department wishes to have statistics taught as an auxiliary to the study of 
X, it feels that there must be no question of the tail wagging the dog, and that 
economy is appropriate in this connection. The members of the department 
of professorial rank do not respond favorably to the suggestion that they should 
themselves undertake to teach the new and unfamiliar course. But every 
university department has a bright graduate student whose placement is an 
immediate problem. Young Jones has already demonstrated a quantitative turn 
of mind in the course on Money and Banking, or in the Ph.D. thesis on which 
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he has already made substantial progress, dealing with The Proportion of 
Public School Yard Areas Surfaced with Gravel. He may even recall having 
had a high-school course in trigonometry. His personality is all that might 
be desired. He is a white, Protestant, native-born American. And so, the 
“Instructor to be announced” materializes as Jones. 

This earnest young scholar now finds that, in addition to completing his 
thesis, he must look up the literature of statistics and prepare a course in the 
subject. His attention is directed by older members of the department to 
some of the research papers in the field of X involving statistics. He pursues 
“statistics” through the library card catalog and the encyclopedias. He reads 
about census and vital statistics, price statistics, statistical mechanics. Per¬ 
haps he encounters probable errors, Eventually he learns that Karl Pearson 
is the great man of statistics, and that Biometrika is the central source of infor¬ 
mation. Unfortunately most of the papers in Biometrika and of Pearson’s 
writings, while not lacking in vigor, trail off into mathematical discourse of a 
kind with which young Jones feels ill at ease. What he wants is a textbook, 
couched in simple language and omitting all mathematics, to make the subject 
clear to a beginner. Perhaps he finds the impressive books of Yule and Bowley, 
but decides that they are too abstruse Elderton's “Frequency Curves and 
Correlation” is far too mathematical. Jones decides that a simple book on 
statistics must be written, and that he will do it if he can ever succeed in master¬ 
ing the subject. In the meantime, he contents himself perforce with the less 
mathematical writings of Karl Pearson, with applied examples in the field of X, 
and with such nonmathematical textbooks as may have been written by other 
young men who have earlier trod the same path as that on which Jones is now 
beginning, Somehow or other he gets the class through the course. After 
doing this two or three times, Jones is an experienced teacher of statistics, and 
his services are much in demand. His course expands, takes on a settled form, 
and after a while crystallizes into a textbook. At the same time he may be 
getting out some research, consisting of studies in the field of X in which statis¬ 
tical methods play a part. His promotion is rapid. He becomes a Professor 
of Statistics, and perhaps an officer in a national association. His textbook 
has a large sale, and is used as a source by other young men writing textbooks 
on statistics. 

The textbooks written in this way form an interesting literary cycle. Meas¬ 
ures of “central tendency" and of dispersion are introduced, and the use of 
one as against another of these measures is debated on every ground except 
the criterion that modern research has shown to be the important one, the 
sampling stability. Sampling considerations, indeed, get little attention. 
The urge to simplify by leaving out the more difficult parts of the subject, and 
especially the mathematical parts, is accompanied by pride in the great number 
of examples drawn from real life, that is, actual data that have been collected. 

But the most fascinating feature of this literary cycle is the opportunity it 
offers for research by the standard methods of literary investigation, tracing the 
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influence of one author upon another through parallelism of , 

fot «. This study is faceted by ths M c»:» 
copying. One outstanding example is in certain formulae connected with the 
rank correlation coefficient, derived originally by Karl Pearson in 1907 and 
copied from textbook to textbook without adequate checking back As one 
error after another was introduced in this process, the formulae presented to 
students (and apparently made the basis of class exercises involving numerical 
substitution) became less and less like Pearson’s original equations Inci 
dentally, in trying to check this original work of Pearson’s, recent investigation 
has raised the suspicion that it is erroneous; at any rate, he does not give a fully 
adequate argument. Thus it may be that the errors in copying, which are so 
useful in examining the history of statistics, never did any harm. The formulae 
in which the students were drilled may have been no worse than they would 
have been if all the copying had been done with more care. 

While this process has been going on in the Department of X, the Y and Z 
Departments have likewise evolved the teaching of statistics. There is some 
interchange of ideas between the various statisticians on the campus and there 
is a catholicity in the copying of textbooks. But by and large, statistics is 
regarded in the Economics Department as a branch of economics, in the Psy¬ 
chology Department as a part of psychology, and so forth. The astronomer is 
inclined to resent the suggestion that his students should be called upon to study 
their least squares with anyone but an astronomer. Medical and biological 
investigators suspect Economics and Psychology of charlatanry, and do not 
look with favor on the idea of turning their own students over to such depart¬ 
ments for instruction in statistics. Most Unthinkable of all would be putting 
the Department of Education in charge of an essential part of the training of 
scientific students. Thus the courses multiply. 

The fact that it is essentially the same fundamental subject that is being 
taught under various names and with various kinds of notation in different 
departments is often concealed by including the teaching of statistical theory 
in a course whose title and prospectus are more suggestive of applications. A 


case in point is that of an economist of my acquaintance, not primarily engaged 
in teaching, who some years ago was invited to give a course in Price Forecasting 
in the Economics Department of a leading university. He carefully prepared a 
series of lectures on this subject, which had been the center of some extended 
research he had conducted. A large class enrolled for the course. But soon 


after beginning his series of lectures the economist noticed that the class was 
growing restive. Upon inquiring what was amiss, he learned that his discourse 
was unintelligible to many of them because he was using technical statistical 
terms and concepts with which they were not familiar. He thereupon under¬ 
took to use simpler language, and when this did not suffice to convey his mean¬ 
ing, to explain, the statistical notions involved in his work on price forecasting. 
More and more his lectures came to deal with the elements of statistics, and less 
and less with price forecasting. At the end of the term he felt that he had 
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given the students some elementaiy knowledge of statistical theory, for which 
they had not enrolled and for which he did not feel particularly well qualified, 
but had taught them virtually nothing about price forecasting. When the 
invitation was repeated the next year, the economist suggested imposing a course 
in statistics as a prerequisite for the course in Price Forecasting. This however 
was vetoed by the head of the Economics Department, who did not believe in 
prerequisites. The Price Forecasting course was not repeated. 

T his incident illustrates the evolution of a good deal of statistical teaching. 
At the beginning, the idea is to teach some application, but the teacher soon 
finds himself engaged at much more length than expected with the fundamentals 
of statistical theory and methods In this way it has come about that a large 
number of persons are teaching theoretical statistics who initially had no inten¬ 
tion of doing so, but were concerned with particular applications. The teach¬ 
ing of statistical theory has been undertaken belatedly and inexpertly because 
it was necessary to a discussion of some application originally in view. Thus 
it happens that a good deal of teaching of statistics, even of mathematical 
statistics, masquerades as something else. 

The obvious inefficiency of overlapping and duplicating courses given inde¬ 
pendently in numerous departments by persons who are not really specialists 
in the subject leads to the suggestion that the whole matter be taken over by the 
Department of Mathematics. This is a promising solution, but it is doomed to 
failure if, as has sometimes happened, it means that the teaching of statistics 
is put under the jurisdiction of those who have no real interest in it. Moreover 
the teaching of statistics cannot be done appreciably better by mathematicians 
ignorant of the subject than by psychologists or agricultural experimenters 
ignorant of the subject. The latter indeed have a certain advantage in that the 
problems seem more real and definite to them; they can sense the difference 
between the important and the unimportant questions, even if they cannot 
express the questions m clear mathematical language, and can sometimes arrive 
intuitively at a correct result that leaves the mathematician puzzled. Also, 
they can understand more readily than can the mathematician the examples, 
drawn largely from biological material, which play so important a part in some 
of the leading expository work on statistics, such as Id, A. Fisher’s Statistical 
Methods for Research Workers. The pure mathematician has only one advan¬ 
tage over the non-roathematical worker in empirical fields: he is able to set about 
reading the serious literature of statistical theory. But he must still find this 
scattered literature, sort it out from a mass of rubbish, fallacies, and false starts, 
and trace it back historically until he can understand the notation and the pre¬ 
suppositions He must also contend with the fact that a good deal that is im¬ 
portant in statistics is still a matter of oral tradition, and some consists of lab¬ 
oratory techniques. In short, he needs a teacher before he himself sets out to 
teach the subject. When a Department of Mathematics calls in a young Ph.D., 
however brilliant, to teach statistics as a part or all of his program, the best 
thing it can do, if he has not already had a training in modern statistics, is to 
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give him a furlough for a year or two to enable him to go where he can acquire 
such a training. 

Qualifications of a good teacher of statistics include, first and foremost, a 
thorough knowledge of the subject. This statement seems trivial, but it has 
been ignored in such a way as to bring about the present unfortunate situation. 
Mathematicians and others, who deplore the tendency of Schools of Education 
to turn loose on the world teachers who have not specialized in the subjects they 
are to teach, would do well to consider their own tendency to entrust the teach¬ 
ing of statistics to persons who not only have not specialized in the subject, 
but have no sound knowledge of it whatever. A knowledge of theoretical 
statistics is not easy to obtain. There is no comprehensive treatise on the sub¬ 
ject, starting from first principles, and proceeding by sound deductions and 
well-chosen definitions to the methods that need to be used in practice. (I 
have been trying for years to write such a treatise, but it has turned out to be a 
bigger task than at first appeared. This is partly because some things formerly 
thought to have been proved turn out, on critical examination, not to be sound, 
and much new research has been necessary.) The literature is scattered through 
journals pertaining primarily to many kinds of applications, and it is only in 
recent years that any large proportion of the current contributions to statistical 
theory and methods have been gathered into a few periodicals devoted to sta¬ 
tistical theory. On the other hand, the seeker after truth regarding statistical 
theory must make his way through or around an enormous amount of trash 
and downright error. The great accumulation of published writings on statis¬ 
tical theory and methods by authors who have not sufficiently studied the sub¬ 
ject is even more dangerous than the classroom teaching by the same people. 

A good teacher of statistics needs of course a mathematical background, in¬ 
cluding at least an acquaintance with the theory of functions and n-dimensional 
euclidean geometry. A good deal of additional algebra and analysis ^ hkdy 
to be helpful, as well as some differential geometry. But no amoun ^ f s '’ 
mathematics constitutes by itself any approach to ^h^ncym the qualifi - 
tions of a teacher of statistics. The most essential thing is that the man shall 
know the theory of statistics itself thoroughly from the ground up including 

to apply them in various empirical fields. . teacher of 

andthe knowledge of statistical theory, ‘ Zm 

statistics needs a really intimate aequamtmce with the ptoble I 
empirical subjects in which statistical methods me^PpM, ™ 
portant. Sometimes excellent mathematicians necessary for 

students through failure to get that feeling for applications thatnecessary 

ofltlu^ iaan been ^ 

that some of the first things that nee o e 1 sa gome q{ the mosfc rece nt 

for prospective practical statistics are definition is it wise to give to 

researches. So elementary a question as What aenniuo 
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the term ‘standard deviation’?”, which must be faced by every teacher of 
Statistics 1, requires for an intelligent answer a rather thorough understanding 
of modern sampling theory and techniques. The answer, it now seems, is 
not the d efini tion given in most textbooks. In the selection of a statistic to 
represent a parameter, for example in fitting frequency curves or m linkage 
estimation in genetics, the fundamental consideration is connected with the 
sampling distribution, as R. A. Fisher showed in founding the modern theory of 
estimation. This is ignored in most of the current teaching of statistics, with 
the result that innumerable students are sent out to waste the money and time 
of their employers by demanding larger samples than are necessary for the pur¬ 
poses in view, wasting costly information by calculating inefficient statistics 
and using tests that are not the most powerful. On the other hand, students of 
statistics who are taught rule-of-thumb methods without their derivations are 
never quite conscious of the exact limitations and assumptions involved, and 
may make unwarranted inferences from samples that are too small or in some 
way violate the conditions underlying the derivations of the formulae. 

A good teacher of statistics must be'thoroughly familiar with these recent 
advances. He must examine very critically textbook statements unsupported 
by full proofs. Even though the students are not capable of following the 
complete mathematical argument—indeed, especially if the students are not to 
examine it—the instructor needs to give it a critical study. The custom of 
omitting proofs, which would not be tolerated in pure mathematics beyond 
a very limited extent, is common in the teaching of statistics, and is excused on 
the ground that the students do not know enough mathematics to understand 
the proofs. Perhaps in some cases a better reason is that the teachers, and the 
authors of the textbooks, do not understand the proofs. In some instances 
no proofs exist, and in some instances no genuine proofs can exist, because the 
methods taught are demonstrably wrong. The custom prevalent in the teach¬ 
ing of mathematics of going over each proof carefully in the class is, among other 
things, a safeguard against infiltration of false propositions. This safeguard is 
missing from most of the teaching of statistics, and there has been an infiltration 
of errors. Since it is accepted that a great many students need to learn some¬ 
thing about statistical methods without learning enough mathematics to under¬ 
stand the proofs, it follows that the elementary teaching of statistics to these 
students must, if the perpetuation of gross errors is to be avoided, be in the 
hands, of really competent mathematical statisticians. This is perhaps the 
greatest reform needed in the teaching of statistics today. Until the elementary 
teaching of statistics is conducted by those with a thorough and critical knowl¬ 
edge of current research in statistical theory, of a sort that seems virtually 
inseparable from participation in that research, there is likely to be a continua¬ 
tion of the laborious drilling of thousands of students in methods that ought 
never to be used. Here, of all places, is the great need for participation of 
research workers in elementary teaching. 

Teachers and textbook writers might well abandon the idea of telling what 



THE TEACHING OP STATISTICS 


465 ' 


statistical methods are used, and say instead what methods ought to be used. 
But before they can do this with confidence they must have a very close ac¬ 
quaintance with the research of the last three decades in statistical theory. 

How can an appointing officer know whether a prospective teacher of statistics 
knows his subject? This question requires no answer peculiar to statistics in 
distinction from other subjects. Publication of research, constituting a contri¬ 
bution to the particular field, has always been accepted as the best proof, A 
substantial contribution to fundamental statistical theory, which is to be dis¬ 
tinguished from the mere application of known statistical methods to empirical 
data, is the best indication of the kind of scholarship appropriate to a teacher of 


statistics. 

Participation in research is not novel as a criterion of what constitutes a good 
teacher of a college or university subject, if the subject is Greek literature, 
physics, chemistry, biology, or indeed any of those departments that have been 
long enough established to attain with respect to the organization of their teach¬ 
ing a state approximating equilibrium. The more reputable institutions of 
higher learning have long maintained the principle, though with occasional 
violations in practice, that the Ph.D: degree or its equivalent, representing among 
other things the completion of a piece of scholarly research, is a minimum 
condition for a regular faculty appointment. It has usually been maintained 
also that the Ph.D. thesis should be a new contribution of a strictly scholarly 
character to the field of the scholar’s competence, and not merely a routine 
application of known methods to an extraneous field. Thus a thesis offered for 
the Ph.D. degree in mathematics would be judged by its contribution to mathe¬ 
matics, rather than to physics or accounting. Moreover the regard in which 
universities have held members of their faculties has been intimately connected 
with their output of scholarly research. Other criteria of excellence have not 
been ignored, but research has been recognized in a fairly consistent manner. 
Some say that there has been an over-emphasis on research, and that moTe a - 
tention ought to be given to other qualities related to teaching. However 
this may be the facts remain that scholarly research is something capable 
of a reasonably objective evaluation by scholars in the field that it offers he 
main hope of fundamental progress, and that familiarity with current rese ^ 
is a necessary, though not sufficient, condition for the most important teaching 

in institutions of higher learning. _ . , ,, { 

A peculiarity of the teaching of statistics, of which in practice the theoiy 

statist is even if un.cknowled g ed part .s 

has been conducted by persons engaged m research, not ^ ^md cont b g 

“ szsr,^ »*:" o. 

mathematics were in the hand. o( an assortment of ymonabadarfe npmm, or 
concreteness by such arrangements, with the accompa yi b p 
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particular applications of the fundamental sciences. Moreover the engineer 
might in the course of such teaching refresh his own knowledge of elementary 
mathematics, while the physician might gain by renewing his acquaintance with 
elementary biolpgy. Such arrangements might occasionally be made with 
profit. But if they were the general rule the advantages of specialization would 
be lost; the fundamental sciences would not be developed in so well-rounded a 
manner as they are by specialists in them, while the special skills and knowledge 
of the physician and engineer could not be utilized to the full in their respective 
professions. Statistical theory is a big enough thing in itself to absorb the full¬ 
time attention of a specialist teaching it, without his going out into applications 
too freely. Some attention to applications is indeed valuable, and perhaps 
even indispensable as a stage in the training of a teacher of statistics and as a 
continuing interest. But particular applications should not dominate the 
teaching of the fundamental science, any more than particular diseases should 
dominate the teaching of anatomy and bacteriology to pre-medical students. 
These subjects are not ordinarily taught by practicing physicians, but by anat¬ 
omists and bacteriologists respectively. 

In medical education the principle has been accepted, after a long struggle, 
that a medical school should have full-time professors engaged primarily in 
teaching and research, and that such professors should not treat patients except 
in cases of unusual interest from the standpoint of the science or art of medicine. 
An analogous principle would be that an institution offering extensive instruc¬ 
tion in statistics should have full-time professors engaged in the teaching of and 
research in statistical theory and methods, without spending time over applied 
statistical problems exceptmg insofar as such problems might present novel 
features calling for the development of new statistical methods or theoretical 
extensions having interest going beyond the immediate case. Sometimes the 
complaint is heard ip medical schools that the teaching tends to become too 
theoretical on account of detachment from chnical practice, and a similar diffi¬ 
culty might conceivably develop m connection with statistics; but in neither 
case does the trouble seem to be beyond the ability of the personnel involved to 
cure if they have the right background. 

A specialist in statistics on a university faculty has a'threefold function. In 
addition to the usual duties of teaching and research, there is a need for him to 
advise his colleagues, and other research workers, regarding the statistical 
methods appropriate to their various investigations. The advisory function is 
a highly important one for the activities of the university as a whole, and should 
be taken into consideration in adjusting the teaching load. Probably every 
university statistician is visited from time to time by earnest research workers, 
deeply engrossed in their respective specialities, speaking technical jargons un¬ 
familiar to the statistician, and seeking his advice on matters concerning which 
he has a sinking feeling of lack of comprehension After some hours of psycho¬ 
analyzing his visitor the statistician may be able to ascertain what it is he really 
wants to know, and thereafter either refer him to some standard formula, or 
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more often ) undertake a piece of new mathematical research designed to fit 
the particular problem, and very possibly having value also for a more extended 
class of problems. The statistician is then very likely to find himself embarked 
on a co-operative research venture in a field that is new to him. 

To function well in this third, the consultative or co-operative function, he 
must have an unusually large store of general information. No one stands in 
greater need than he of that knowledge of “something about everything and 
everything about something” that was once said to be the goal of a liberal 
education, In planning the education of statisticians and teachers of statistics 
these considerations point to a somewhat wider diffusion of studies among vari¬ 
ous fields than is customary ip many institutions, especially in graduate work. 
The co-operation, and their other work, would also be facilitated if research 
workers in general were more strongly urged to get a training in mathematical 
statistics at an early stage in their careers. 

The problem of departmental organization is secondary to that of getting men 
having the requisite qualities qf extensive mathematical preparation, a thorough 
knowledge of modern theoretical statistics, an understanding of some fields at 
least in which statistical methods can be applied, and the type of inquiring 
mind sometimes described as a “research outlook.” A Department of Mathe¬ 
matics may well handle the fundamental teaching in statistics, provided it has 
men properly qualified for such teaching. If it does not have such men, its 
teaching of statistics and its inability to provide the needed statistical advice 
will inevitably tempt the other departments to set up again their own duplicat¬ 
ing courses in what amounts essentially to statistical theory and methods, and 
to repeat the mistakes of the past. 

A separate Department of Statistics, if competently staffed, could very well 
provide advice for the whole institution as well as conducting elementary in¬ 
struction in statistical methods and theory, both for students having calculus 
and for those without it, and should certainly carry on advanced teaching and 
research in statistical theory and methods. But for efficient functioning of the 
institution as a whole it should he agreed that the Department of Statistics or 
the Department of Mathematics should do all the elementary instruction in 
statistics, and that courses in statistics in other departments should be confined 
to applications of the basic theory. Normally such courses in applied statistics 
in the other departments should require as a prerequisite one ot more of the basic 
courses in the Department of Statistics, or of Mathematics. The basic course 
to be required as a prerequisite to others should be the one which itself requires 
calculus as a prerequisite wherever this is practicable. It is practicable for 
Btudents of engineering, physics, astronomy, and mathematical economics, since 
these students must have calculus anyhow. Moreover the value of the se¬ 
quence consisting of calculus, statistical theory and applied statistics, in this 
order, is so great that many other students are likely to avail themselves of it 
when it is once established and the true nature and value of statistics are more 
widely understood. 
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Exactly how far a Department of Statistics should go in particular applica¬ 
tions would have to be decided anew from time to time by its members in the 
light of changing conditions and interests. It cannot teach everything that goes 
by the name of statistics. This problem may be exemplified by the case of 
population and vital statistics. This is a field with close connections with so¬ 
ciology, biology, medicine and insurance. It is cultivated in conjunction with 
each of these subjects in various places. Some of its most interesting and im¬ 
portant phases make use of quite advanced mathematics, as in the work of 
A. J. Lotka, and in addition there is extensive use, and more extensive need, of 
the statistical methods centered around sampling theory which are the appro¬ 
priate domain of a Department of Statistics. Should the study of population 
and vital statistics be included in a Department of Statistics? I think not, 
except as a temporary arrangement, or in a small institution, in spite of the 
history of the word “statistics/’ which originated in connection with material 
of this kind, and in one of its meanings is still applied to it. (My use of the 
unqualified word “statistics” in this paper is in the sense of theory and methods, 
not in the sense of statistical facts such as those found by the census.) Medical, 
biological and sociological considerations are prominent in the problems of vital 
statistics, and one of these departments might well handle the subject. But 
the vital statistician, like other research workers, should have acquired in the 
course of his training an intimate familiarity with the statistical theory and 
methods which are the appropriate province of a Department of Statistics. 
He also needs mathematics through integral equations, if he is to understand and 
extend the contributions of Lotka and Volterra. Students of vital statistics 
should have had an elementary course in statistical theory in the Department of 
Statistics, preferably the course requiring calculus. 

A course in price statistics should be taught by an economist, presumably in 
the Department of Economics, but might well require as a prerequisite the same 
elementary courses in statistical theory and methods as would be required in 
psychology, medicine and other fields. In addition, there are problems of time 
series analysis whose treatment calls for a mathematical statistician having some 
acquaintance with both economic and meteorological data. A course on the 
treatment of time series might appropriately be included in the Department of 
Statistics, requiring the general elementary course as a prerequisite, and itself 
serving as a prerequisite for courses in economic and meteorological statistics. 

One of the chief obstacles to efficient organization of teaching is the habit of 
not prescribing prerequisites outside one’s own department. But when once 
the elementary courses in statistics have become established in the hands of well- 


equipped specialists in statistical theory and methods, in whose competence 
general confidence can be reposed, the various departments of application will 
lose their motive for establishing their own duplicating courses, and will be able 
to cultivate more intensively their respective specialities. 


The detection of biases and the details of practical statistical work vary greatly 
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from one application to another. These, consequently, are matters for the de¬ 
partments concerned with applications rather than with the fundamentals of 
statistics, and should not be the chief features of a course in elementary statis¬ 
tical methods and theory. The work of a Department of Statistics should be 
concerned largely with sampling theory, and should emphasize the unity of 
statistical methods and theory, regardless of the field of application. It should 
deal with statistics as a coherent science of inductive inference, of the prepara¬ 
tion of observations for inference, and of the planning of investigations so as to 
yield observations from which inferences can best be made. 

The question what mathematical prerequisites should be established for the 
fundamental course in statistical theory must be answered by a compromise 
between the ideal and what is expedient at a particular time and place. In 
Europe a large number of students have had a year of calculus before coming to 
universities, that is, before reaching the age of eighteen. If a university were 
willing to restrict its entrants to such students (thus automatically solving the 
problem of overcrowding) it could give them another year, of calculus, mixed 
perhaps with advanced algebra and geometry, and then in their sophomore year 
give them a thorough course in elementary statistics and probability, based on 
calculus. These students would then be ready to tackle advanced statistics in 
the third year in a really effective way. If the teaching of economic theory, 
physics, chemistry and astronomy were geared to this program in such a way as 
to make real use of the calculus, the work in these subjects could be made far 
more efficient, in the sense that more material could be covered effectively in 
the allotted time, or an equivalent amount of material in less time. If, in addi¬ 
tion, all the many departments in which statistical methods and theory are used 
required these statistical courses as prerequisites, and actually used the mate¬ 
rials of these courses in their work, there would be a further huge gain in effi¬ 
ciency. The baccalaureate degree of such an institution would represent a far 
more thorough knowledge, and command of the tools of research, than iB possible 
without an arrangement putting in this way the fundamentals first. 

Institutions unwilling to undertake such a drastic improvement must face 
more or lees delay and inadequacy in the acquisition by their students of the 
fundamentals of mathematics and of statistics. A division of the students into 
groups according to mathematical ability ought to be undertaken, and followed 
by a corresponding division of the elementary statistics course. Students having 
high mathematical ability could begin the study of statistics after completing 
calculus, and could look forward to rising ultimately to greater heights in pur¬ 
suits involving mathematical or statistical knowledge than those of leBser mathe¬ 
matical talents. For these latter there would still be the possibility of acquir¬ 
ing, even without calculus, useful statistical tools; but it is essential that this 
should be done under the guidance of instructors thoroughly familiar with the 
mathematics of statistics, The task of leading the blind muBt not be turned 
over to the blind. Students possessing the ability to master the calculus should 
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be encouraged to begin the study of statistics with the course having calculus 
as a prerequisite, and should not be put into the necessarily slower group not 
having the calculus. I believe that these elementary courses should begin with 
the theory of probability, but should go on to the chief distribution functions 
used in practice, and should include applied problems and work on calculating 
machines. 

Putting a sound program of statistical teaching into effect will take time, 
partly because of the scarcity of suitable teachers of statistics. Nevertheless, 1 
the process is well under way, and the prospects are good for substantial im¬ 
provements in the teaching of statistics. A body of able young research men 
possessing the requisite knowledge of statistical fundamentals is now in existence 
and is growing. Some of the recent textbooks represent striking improvements. 
The Institute of Mathematical Statistics itself, with the Annals of Mathematical 
Statistics, is perhaps the best evidence of a changed view making for better 
things. 

Columbia University, 

New York, N. Y. 


DISCUSSION OF PROFESSOR HOTELLING’S PAPER 
By W. Edwards Deming 

It is a pleasure to endorse Professor Hotelling’s recommendations; in fact we 
have been following them pretty closely in the courses in the Graduate School 
of the Department of Agriculture. As a matter of fact, he has indirectly played 
an influential part in building up this set of courses, because some of our best 
instructors are his former students, 

Listemng to Professor Hotelling’s paper, I was thinking of the possibility 
that some of his recommendations might be misunderstood. I take it that they 
are not supposed to embody all that there is in the teaching of statistics, because 
there are many other neglected phases that ought to be stressed. In the Bureau 
of the Census the population division alone has augmented its force by ap¬ 
proximately 3500 statistical clerks during the past six months. They come from 
diverse schools and it has been interesting to observe how many of them have the 
idea that all the problems of sampling and inference from data can be solved by 
what are commonly known as modern statistical techniques—correlation co¬ 
efficients, rank correlation coefficients, chi-square, analysis of variance, con¬ 
fidence limits, and the like. Most of them are shocked to learn that many of 
the so-called modern “theories of estimation” are not theories of estimation at 
all, but are rather theories of distribution and are a disappointment to one who is 
faced with the necessity of making a prediction from his data, i.e., of basing 
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some critical course of action on them. The conviction that such devices as 
confidence limits and Student's t provide a basis for action regardless of the 
size of the sample whence they were computed, even under conditions of statis¬ 
tical control, is too common a fallacy On the other hand, many simple but 
worthy devices are neglected. A histogram, for instance, can be a genuine 
tool of prediction if it is built up layer by layer in different legends so as to dis¬ 
tinguish the different sources whence the data are derived. The modern student, 
and too often his teacher, overlook the fact that such a simple thing as a scatter 
diagram is a more important tool of prediction than the correlation coefficient, 
especially if the points are labeled so as to distinguish the different sources of the 
data. Most students do not realize that for purposes of prediction the con¬ 
sistency or lack of it between many small samples may be much more valuable 
than any probability calculations that can be made from them or from the entire 
lot. Students are not usually admonished against grouping data from heterog¬ 
eneous sources. Of those that are not guilty of indiscriminate grouping, many 
are inclined to rely on statistical tests for distinguishing heterogeneity, rather 
than on a careful consideration of the sources of the data. Too little attention 
is given to the need for statistical control, or to put it more pertinently, since 
statistical control (randomness) is so rarely found, too little attention is given 
to the interpretation of data that arise from conditions not in statistical control. 

Nevertheless, the fundamentals of probability and sampling theory, and the 
mathematics of the distribution functions, though by themselves they do not 
qualify anyone for high-grade statistical work, are ultimately essential for pro¬ 
ficiency in. statistics. Since they are seldom learned away from the university 
they are properly made the main theme of teaching. The university is the 
place to learn the studies that are so difficult to get outside of it. 

Above all, a statistician must be a scientist. The skepticism of many first 
class scientists of today for modern statistical methods should be a challenge to 
statistical teaching. A scientist does not neglect any pertinent information, 
yet students of statistics are often taught to do just the opposite of this, and are 
accused of being old-fashioned for daring to think of combining experience with 
the new information provided by a sample, even if it is a pitifully small one 
Statisticians must be trained to do more than to feed numbers into the mill and 
grind out probabilities; they must look carefully at the data, and take account 
of the conditions under which each observation arises. It is my feeling that 
the chief duty of a statistician is to help design experiments in such a way 
that they provide the maximum knowledge for purposes of prediction; another 
is to compile data with the same object in view; and still a third function is 
to help bring about some changes in the source of the data. Scientific data 
are not taken merely for inventory purposes. There is no use taking data if 
you don’t intend to do something about the sources whence they arise. 

BtrREAtr of the Census, 

Washington 
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RESOLUTIONS ON THE TEACHING OF STATISTICS 

The Institute of Mathematical Statistics at its business meeting on September 
11, 1940 at Dartmouth College adopted the following resolutions regarding the 
teaching of statistics. The resolutions were drawn up by a committee appointed 
by the President, and consisting of Burton H. Camp, W. Edwards Deming, 
Harold Hotelling, and Jerzy Neyman. 

1. If the teaching of statistical theory and methods is to be satisfactory, it 
should be in the hands of persons who have made comprehensive studies of the 
mathematical theory of statistics, and who have been in active contact with 
applications in one or more fields. 

2. The judgment of the adequacy of a teacher’s knowledge of statistical 
theory must rest initially on his published contributions to statistical theory, in° 
contrast with mere applications, in a manner analogous to that long accepted in 
other university subjects. 

3. These ideas are expressed in detail in the paper The teaching of statistics, 
by Professor Harold Hotelling, and the Institute decides to give both the 
resolution and the paper as wide a circulation as possible. 



REPORT OF THE HANOVER MEETING OF THE INSTITUTE 

The sixth meeting of the Institute of Mathematical Statistics was held at 
Dartmouth College, Hanover, New Hampshire, Tuesday to Thursday, Sep¬ 
tember 10 to 12, 1940, in conjunction with meetings of the American Mathe¬ 
matical Society and of the Mathematical Association of America. The fol¬ 
lowing forty-two members of the Institute attended the meeting: 


H. E. Arnold, Felix Bernstein, G. W. Brown, J. H. Bushey, B. H. Camp, A. T. Craig, 
A,R. Crathorne, J. H, Curtiss, J. F, Daly, W. E, Dewing,,! L Doob,ChurchillEisenhart, 
11, L. Elveback, C, H. Fisoher, M. M. Flood, R M. Foster, T C, Fry, H. P. Geiringer, 
Jiobort Henderson, E. H. C, Hildebrandt, G. M. Hopper, Harold Hotelling, E. V. Hunting- 
ton, M. H. Ingraham, Dunham Jackson, W. L, Kichlme, L. F, Knudsen, B, A.. Lengyel, 
IV, G. Madow, J. W. Mauchly, Richard von Mises, E. B. Mode, Jerzy Neyman, P. S- Olm- 
iteact, Oystein Ore, M. M. Sandomire, L. W. Shaw, F F Stephan, A. G Swanson, Abra- 
luun Wald, 8 . S. Wilks, Jacob Wolfowitz. 


The meeting of the Institute consisted of four sessions. At the first session, 
which was held on Tuesday morning. Professor Harold Hotelling of Columbia 
University delivered an address on The Teaching of Statistics. This address 
was followed by considerable discussion on the various aspects of the teaching 
o( statistics. 1 Preceding Professor Hotelling's address a short paper on an 
Empirical Comparison of the “Smooth ” test for goodness of fit with Pearson’s 
ChirSquare test was presented by Professor J. Neyman of the University of 


California. . — ... 

Following Professor Hotelling's address a business meeting of the Institute 

was hold. At this time resolutions on the teaching of statistics were approved 
(see p. 472). The President reported that a War Preparedness pommittee 
had been appointed in the summer to study the matter of the Institute apar- 
ticipation in the national defense program 2 The Chairman of this Committee 
aubtaitted a preliminary report which, met the approval of he “a A 
plan was approved for completing the report and circularizing it with a minimum 

^matter of the organization of local sections or chapters of the Institute 
was discussed but no action was taken. 


. Professor MW “ d 

published m the present issue of the Anna J 
‘The membership of the Committee is flS foIIows . _ d Wiscons in. 
Professor Churchill Eisenhart (Chairman), Umvemty 
Professor A. T. Cra.g, University °! Io ^ h l 
Professor E. G. Olds, Carnegie ^titute of Techno^gy, 

Captain Leslie E. Simon, Aberdeen Proving Ground. 

Mr. Ralph E. Wareham, General Electric Company. 
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On Tuesday afternoon a session on contributed papers in Mathematical 
Statistics was held jointly with the American Mathematical Society, Pro¬ 
fessor B. H. Camp of Wesleyan University presided and the following papers 
were presented. 

1. Contributions to the theory of the representative method of sampling. 

Dr W G Madow, Department of Agriculture, Washington. 

2. A generalization of the law of large numbers. 

Dr Hilda P. Geiringer, Bryn Mawr College. 

3. On the problem of two samples from normal populations with unequal variances. 

Professor S S. Wilks, Princeton University 

4. Experimental determination of the maximum of an empirical function 

Professor Harold Hotelling, Columbia University. ■> 

5. Asymptotically shortest confidence intervals 

Dr, Abraham Wald, Columbia University 

6. Reduction of certain composite statistical hypotheses. 

Dr G. W Brown, R. H. Macy and Company, Ine., New York. 

7 Conception of equivalence in the limit of tests and its application to certain X and x J 

tests. 

Professor J. Neyman, University of California 
Abstracts of these papers follow this report 

On Wednesday morning a session was held on The Theory of Probability 
with Dr. T. C. Fry of the Bell Telephone Laboratories, in the chair The 
following addresses were given: 

1. On the foundations of probability theory. 

Professor R. von Mises, Harvard University. 

2. Probability as measure. 

Professor J. L. Doob, University of Illinois. 

This session was followed by an energetic discussion which was continued in an 
informal afternoon session. 

The Thursday morning session was devoted to the Theory of Statistical Esti¬ 
mation with Professor Harold Hotelling as Chairman. The following addresses 
were given: 

1. Estimation by intervals as a classical problem m probability. 

Professor J Neyman, The University of California. 

2. Statistical estimation m large samples Dr. Joseph F. Daly, The Catholic Univer¬ 
sity of America. 

On Monday at 4:15 p.m, a tea was held at the Graduate Club for members 
of the mathematical organizations and their guests, and on Monday at 8:00 a 
musical performance was presented. On Tuesday at 7:00 p.m. a joint dinner 
was held for the mathematical organizations in Thayer Hall. Wednesday 
afternoon was devoted to an excursion to Franconia Notch. 

During the meeting a collection of string models of ruled surfaces was ex¬ 
hibited by Professor Robin Robinson of Dartmouth College and electrical 
calculation apparatus made from telephone equipment was exhibited by mem¬ 
bers of the staff of the Bell Telephone Laboratories. 
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(Presented on September 10, 1940, at the Hanover meeting of the Institute) 


Contributions to the Theory of the Representative Method of Sampling. 

William G. Madow, Washington, D. C 


The theory of representative sampling may be regarded as a dual sampling process, the 
first of which consists in the sampling of different random variables and the second of which 
consists in repeating several times the experiments associated with each of the different 
random variables. It follows that while the theory of sampling from finite populations 
without replacement may be required for the first process, the second leads directly into 
the theory of sampling from infinite populations. There is, however, one difference. 
Although the usual theory is concerned with the evaluation of fiducial or confidence limits 
for parameters the theory of sampling is concerned with the evaluation of fiducial or confi¬ 
dence limits for, say, the mean of a sample of N, when n, (N > n), of the values are known. 

It is thus possible to use the usual theories of estimation m obtaining estimates of the 
parameters and to allow the effects of subsamplmg process to show themselves in the 
different values of the fiducial limits It is shown that the limits obtained are almost 
identical with those obtained by the theory of sampling from a finite population. Distri¬ 
butions of the statistics used in these limits are derived 

Besides these results, the theory is extended to the theory of sampling veetofs, and condi¬ 
tions are stated under which the "best 1 ’ allocation of the number in a sample among several 
strata is proportional to the feth roots of the generalized variance of a random vector 
having k components 

A Generalization of the Law of Large Numbers. Hilda Geiringer, Bryn 


Mawr. 

Let V>(x) Vi(x), , V n (x) be n probability distributions which are not supposed to 
be independent and let F(*i ft.) be a "statistical function” of n observations 

in the sense of v. Mi8M,-7.(*) (i - 1, 2, ••• ») indicating as usual the probability of 
getting a result £ x at the tth observation-. Then tt can be proved that under fairly 
general conditions F(x i, converges stochastically toward Us ^eiml 

value”, or in other words, that under these general conditions a great class of statistics 
F(x i, x ,, . • • , »,) i« "consistent” in the sense of It. A. Fisher 

Well known particular cases of this theorem result if (a) we take for F(x, , *., ”' - » 
the average (* + ., + ■; + *)/» the n observations, (b) we assume that the V.W 

are independent distributions. 

On the Problem of Two Samples from Normal Populations with Unequal Vari¬ 
ances. S. S. Wilks, Princeton University. 

Suppose O n and O nj are samples of m and n, elements from normal ]populaticms n and 

to potato parameters It is thereto, impoesibl. t» .Warn e,..t eonfitoe. hmrt. 
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for oi — a .1 corresponding to a given confidence coefficient, Functions of the four parame¬ 
ters and four statistics are devised from which one can set up confidence limits for — a, 
with associated confidence coefficient inequalities. 

Experimental Determination of the M axim u m of an Empirical Function. 

Harold Hotelling, Columbia University. 

In physical and economic experimentation to determine the maximum of an unknown 
function, for example of a monopolist’s profit as a function of price, or of the magnetic 
permeability of an alloy as a function of its composition, the characteristic procedure is to 
perform experiments with chosen values of the argument x, each of which then yields an 
observation, subject to error, on the corresponding functional value y = fix). The values 
of x need, however, to be chosen on the basis of earlier experiments in order to make the 
determination efficient. The experimentation properly proceeds, therefore, in successive 
stages, with the values used at each stage determined with the help of the earlier work. 
The question what distribution of x as a function of previous results should be used is 
discussed in this paper on the basis of various hypotheses regarding the function, and 
further oriteria. In particular, a conflict is shown to exist under Borne conditions between 
the criterion of minimum sampling variance and that calling for absence of bias 


Asymptotically Shortest Confidence Intervals. Abraham Wald, Columbia 
University, 


Let f(x, 0) be the probability density function of a variate x involving an unknown 
parameter 0. Denote by x t , independent observations on x and let (7„(0) be a 


positive function of 0 such that the probability that 




< Cn(e) 


iB equal to a .constant 0 under the assumption that 0 is the true value of the parameter 

1 3 

Denote by 0'(xi , ■ • • , x„) the root in 0 of the equation —= ~ ^ log /(»„, 0) - C„(0) 

\/n ob „ 


1 0 

and by 0"(*i, • ■ , x„) the root of log/(x„, 0) => — C„(0). Under some weak 

« 

assumptions on /(r, 0) the interval 0„(xi, • • , x„) - [0'(*i, ■ • • , *„), 0”(xi, • • • , x„)] 
is in the limit with n —► o° a shortest unbiased confidence interval 1 of 0 corresponding to 
the confidence coefficient 0. This confidence interval is identical with that given by S. S. 
Wilks in his paper "Shortest average confidence intervals from large samples,” The Annals 
of Mathematical Slahslics r Sept. 1938. Wilks has shown that S„(xi , ■••,*„) is asymptot¬ 
ically shortest in the average compared with all confidence intervals computed on the 
basiB of statistics belonging to a certain class C. In the present paper it has been proved 
that the confidence interval in question is asymptotically Shortest compared with any 
arbitrary unbiased confidence interval, without any restriction to a certain olass of 
functions. 


Reduction of Certain Composite Statistical Hypotheses. George W. Brown, 
R. H. Macy and Go., New York. 

The results obtained make it possible to reduoe a large class of composite statistical 
hypotheses to equivalent simple hypotheses The fundamental theorem established Btates 
essentially that if two distributions give rise, in sampling, to the same distribution of the 


1 For the definition of a shortest unbiased confidence interval see the paper by J. Ney- 
man, “Outline of a theory of statistical estimation based on the classical theory of proba¬ 
bility,” Phil Trans Roy. Soc, (1937). 
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set of differences between observations, then one distribution must be a translation of the 
other, subject to a condition requiring that the characteristic function of one of the distri¬ 
butions be such that any interior intervals of zeros be not too large. The result is estab¬ 
lished by means of the functional equation v{UMt%M~h - t,) - - t,) 

relating the characteristic functions. Similar results are obtained for scale, and com¬ 
bination of location and scale, and the corresponding situations in multivariate distribu¬ 
tions. This type of uniqueness theorem permits one to reduce a composite hypothesis 
involving an unknown location parameter (or scale, or both) to an equivalent simple 
hypothesis. 

Conception of Equivalence in the Limit of Tests and Its Application to Certain 

and x 2 -Tests, J. Neyman, University of California. 

Denote by E a system of observable variables and by N the number of independent 
observations of those variables to be used for testing a certain statistical hypothesis H 
against a set fi of admissible simple hypotheses h. Let further T\(N) and T,(N) be two 
different tests of H using the same number JV of observations. Consider the probability 
Pn{h) calculated on any admissible simple hypothesis h, of the two tests, contradicting 
themselves. 

Definition'. If, whatever be h < fi, the probability Pn(h) tends to zero as N is indefinitely 
increased, then the two tests are said to be equivalent in the limit. 

Consider a number a of series of independent trials and denote by JSii, , • ■ •, 

all the mi possible and mutually exclusive outcomes of each of the trials forming the ith 
senes. Let pi/ be the probability of E,i, m the total number of trials in the ith series, 
and nn the number of these which give the outcome E,i. 

Suppose that it is desired to test a composite hypothesis H concerning all the proba¬ 
bilities pu and consisting of the assumption that any one of them iB a given linear function 
of some l independent parameters tfj , so that 


(1) pi/ — <J</o +■ a<p9i + ••• + a<jiOi 

where the coefficients on t are known. The main result of the paper is then that the Vtest 
of the above hypothesis H , tested against the set ft of alternatives ascribing to the p*/ 
any non-negative valueB, is equivalent in the limit to the test consisting of rejecting H 
when the minimum of the expression 


( 2 ) 


(ruj — n,pu)* 


• . JSi 

x* - z £ 

<-i /-i 7Hi 


calculated with respect to unrestricted variation of the S’ s, exceeds the tabled value of x . 
corresponding to the chosen level of significance « and to the number of degrees of freedom 

i 

T'. mi — s — t. 

It will be noticed that the expression (2) differs from the usual x* in the denominator 


of each term. . . „ 

As an example of the application of the test based on (2), consider the case where M 
varieties of sugar beet are tested for resistance to a certain disease in an experiinent 
arranged in N randomized blocks. Denote by n the number of beets selected at random 
for inspection from each plot and by mi the number of those of the ith variety from the 
plot in the ith block which are fbund to be infected. Denote further by p„ the proportion 
of infected beets of the *th variety in the plot in the jth block. The hypothesis that the 
effects of variety and of block are additive is expressed by pu - V + won 

— zBi - 0. To test this hypothesis we may UBe (2) which in thiB particular case 

reduces itself to 
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U N 

(3) x 3 " 2 ~ P " 

t=i jii 

with tii,/ = n’/fwi j{n - n,/)), 2n = W(//n. The minimum xo of x 2 18 found by solving a 
set of equations which are linear in p, 7,, Bj and the comparison of xi5 with the tabled 
value corresponding to (M - l)(N - 1) degrees of freedom will tell us whether we are 
likely to be very wrong in assuming additivity or not, In the favorable case we may 
next proceed similarly to test another hypothesis that there is no differentiation between 
the varieties, so that V i = V* - • • = JGr » 0 

Empirical Comparison of the “Smooth” Test for Goodness of Fit with the 
Pearson's Test. J. Neyman, University of California. 

I 

In a previous publication 5 the author has deduced a test for goodness of fit, described 
as the “smooth test” or the p test, applicable to cases where the hypothesis tested H 
is simple The test is so devised as to be particularly sensitive to departures from H 
which are “smooth 11 in the sense explained in detail in the publication quoted. Whether 
the test so devised does present any advantage over the usual x l test depends on how 
frequently we meet, in practice, cases where the hypotheses alternative to the one tested 
are actually smooth 

The present investigation was undertaken with the object of obtaining some information 
on this point. For that purpose a number of"cases described in the literature whore thero 
was a question of testing that some observable variable x follows some perfectly specified 
distribution p(x) were analyzed. Of all such cases, the ones whore there were a priori 
theoretical reasons to believe that p(x) could not possibly represent the true distribution 
of x and, at the most, it could be considered as only an approximation to the true distri¬ 
bution were selected 

It was assumed that the departures from the hypothetical distributions are typical of 
those that may be met m practice when no definite information as to the actual state of 
affairs is available, The hypothesis of goodness of fit was tested both by means of the 
X s and by the fourth order smooth test. Out of the 130 cases studied the two tests were 
in perfect agreement eight times Out of the remaining 122 cases the smooth test proved 
to be more sensitive than the x 2 in 70 cases and the x 2 better than the smooth test in 62 
cases, We may further compare the tests by counting those cases where one of them 
detected the falsehood of the hypothesis tested at a given level of significance while the 
other failed to do so. At the level of significance .05 the x 2 test rejected the hypothesis 
tested 13 times, while was >,05. The reverse was true in 17 eases. At the level of 
significance .01 the corresponding figures are 5 and 14, again in favor of the smooth test. 


2 J. Neyman, “ 'Smooth Test’ for Goodness of Fit,’’ Skandinavisk Aktuarielidskrift, 
1937, pp. 149-199, 



REPORT OF THE WAR PREPAREDNESS COMMITTEE OF THE 
INSTITUTE OF MATHEMATICAL STATISTICS 


The generally recognized functions of a statistician are the calculation of 
averages, percentages, and index numbers; the construction of bar graphs and 
pie diagrams; and the compilation of data in general. His other activities 
are less widely known. In particular, the recent advances in mathematical sta¬ 
tistics are known to a relatively small proportion of the persons occupying 
responsible positions in academic life, in industry, and in government. The 
mathematical statistician, in fact, is concerned chiefly with the interpretation 
of data through the use of probability theory; bis is the science of reasoning 
from a part to the whole, and of prediction; and to him falls the task of stating 
the conditions under which such inferences are possible, of devising means of 
testing whether these conditions are satisfied, and of evaluating the prob¬ 
ability that such ‘uncertain inferences’ are correct in specific instances. Fur¬ 
thermore, it is his responsibility to so plan the lay-out of experiments and the 
conduct of surveys that the data they yield will contain the maximum informa¬ 
tion on the points at issue and be amenable to unambiguous statistical 
interpretation. 

Because of the functions which the mathematical statistician can perform his 
services should be of value to the National Defense Program in the following 
fields: 


I. Quality Control and Specification. The functions of a mathematical 
statistical nature connected with quality control and specification of articles 
produced by mass production are: 

(1) Tests of randomness. These are important because statistical methods 

of inference are strictly valid only for random samples. 

(2) The use of probability theory tn predicting the outcome of future repetitions 
of an operation which is m a state of statistical control. 1 The evaluation of t e 
probability that the quality of a piece of product will lie within any previously 
specified tolerance limits as long as a state of statistical control is maintained, 
and the development of sampling inspection techniques are examples o s 
function. 


i A repetitive operation, such as a production process, is Baid to be m a stale 
control when it produces a sequence of observations which eArbit the pro^rty ron^n- 
ness, An important aspect of quality control is the improvement of jab^ whmh oo^e 
as the result of an effort to reduce a manufacturing process toe , rtate ofIsat-6*1 wrtr L 
Furthermore, when this state of control is attained it >s possible 0 a reduction in 
cost of inspection, a reduction in cost of rejections, a reduction in trance^ 
quality measurement is ind.rect, and the attainment of uniform quality even though the 

inspection test is destructive. 
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(3) Representative sampling. When a repetitive operation such as a produc¬ 
tion process is not in a state of statistical control, it is not possible to make 
valid inferences about the quality of a lot from an examination of a sample 
from the lot unless the sampling process is one of random selection within 
"strata” in accordance with the principles of representative sampling. 

(4) Analysis of variance. Reference is made here to the technique whereby 
the total variability of a product of an operation which is in a state of statis¬ 
tical control can be decomposed into components associated with the various 
sub-operations involved. 

(5) Correlation methods. When a direct measurement of quality is extremely 
costly, it is sometimes advisable to use as an indirect measurement of quality 
the value of some character less costly to measure which is highly correlated 
with quality, 

(6) Specification of quality as a variable. Statistical theory, including tests 
for randomness, must be taken into account in writing quality specifications if 
the consumer is to be protected against the vagaries of sampling and the pro¬ 
ducer safeguarded from the incurring of penalties of an unjust chance. 

II. Sampling Surveys. The importance of conducting sampling surveys 
in accordance with the principles of representative sampling is well established. 
It is quite possible that such surveys and partial censuses will be needed in 
connection with the National Defense Program in order to determine the 
frequency and location of individuals possessing special traits, e.g. persons 
capable of withstanding the rigours of dive bombing, or persons possessing 
types of color blindness which render them valuable as observers who can 
detect camouflage, etc. The “problem of sizes” connected with Stores and 
Supplies—see below—may require careful preliminary surveys. Also, surveys 
may be needed to evaluate the effects of various types of propaganda. 

III. Experimentation of Various Kinds. The mathematical statistician 
can be of service in connection with experimentation of various kinds under¬ 
taken as a part of the National Defense Program since the following aspects 
of experimentation are of a mathematical statistical nature: 

(1) Randomization. Since statistical tests for the existence of differences 
between samples, of correlation, etc. are strictly valid only for random samples, 
the operation of randomization is of paramount importance in “the comparison 
of new designs, new materials or alloys, study of contact phenomena under 
different conditions, corrosion of materials under different atmospheric con¬ 
ditions, and field trial of equipment, to mention only a few.” If randomization 
is not undertaken, observed differences between designs, for instance, may have 
arisen from non-random assignable differences in the material presented. Fur¬ 
thermore, the validity of tests for significant differences between the effects 
of various designs rests upon the condition that the variability observed in 
■the effects of each design be of random character and free from trends and 
non-random shifts in magnitude—i.e. the operation of determining the effects 
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of each design must be in a state of statistical control, to use a phrase employed 
in quality control 

(2) Experimental design. Without careful attention to the lay-out of an 
experiment, the data it yields may be difficult and even impossible to interpret. 
Therefore, the principles of experimental design set forth by R. A. Fisher qnd 
his followers are of great importance, as are also the special experimental ar¬ 
rangements which have been devised to cope with many of the more usual 
difficulties met in practice. 

IV. Personnel Selection. The allocation of individuals to places where 
they can be of greatest value in the National Defense Program will undoubt¬ 
edly require tests for mental and physical traits. Although the development 
and analysis of such tests is largely in the hands of psychometric groups, the 
use of methods of multivariate statistical analysis in such work renders this 
field one in which mathematical statistics ought to play an important role. 


It is in the above four fields that there is special need for the training and 
endowments of the mathematical statistician. He can also render valuable 
assistance in the following fields: 


V. Stores and Supplies. „ 

(1) Problem of sizes. Preliminary surveys are likely to prove useful in 
ascertaining the relative frequencies of demand for the respective sizes of cloth¬ 
ing, etc. in different parts of the country. 

(2) Development of procedures for charting the day to day location and move¬ 
ment of stores and supplies. , . 

(3) Problem of replacement of parts and equipment. In many it is more eco¬ 
nomical to make replacement at statistically determined times, than to wait 
for complete failure. 


VI. Transportation and Communication. Probability theory has shown 
its usefulness in peace time in handling "traffic” problems that arise in telephone 
and telegraph communication, electric power distribution etc. No doubt it 
will find corresponding application to problems in these fields arising out of the 
National Defense Program. 

' VII. Gunnery and Bombing. Although there is a need in connection with 
artillery fire fo/further development of methods of estimating standard devia¬ 
tions from successive differences in order to minimize the biases arising r 
slowly changing conditions during the period of firing, the principles 
fire are quite firmly established and the relatively new science of bombing m 

likelv to present greater opportunities for the application of the methods of 
likely to present greater op P evaluating bombing techniques 

mathematical statistics. For instanc , , , iq„ es f rom the 

tee is need of statieticaf methods in separate the constent buses from tne 

random variability. 
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VIII. Meteorology. The extent to which statistical methods are being 
employed in meteorology can be seen from an examination of the Monthly 
Weather Review Supplement No. 39, issued April 1940, and entitled "Reports 
on Critical Studies of Methods of Long-Range Weather Forecasting.’' There 
seems to be excellent opportunity here for the application of methods of multi¬ 
variate analysis and for the development and uses of methods applicable to 
serially correlated data. Such work would be of value In National Defense 
so far as it would enable the forecasting of conditions suitable for launching an 
attack. 

IX. Medicine. The National Defense Program will probably require the pre¬ 
paration and storage of hormone substances, toxic compounds, drugs, and other 
medicinal supplies. Since many such are examined for potency, toxicity, etc. 
by means of animal assays, there will be considerable opportunity here for 
the sound application of mathematical statistics in planning and interpreting 
these bioassays. 

In nearly all of the above activities the application of mathematical statistics 
is likely to encounter two major difficulties: 

(1) Obtaining an adequate trial of the methods of mathematical statistics. 

(2) Supplying persons to occupy key positions in the application of mathe¬ 
matical statistics in a given field—persons competent in mathematical statis¬ 
tics and who possess a sound background in the field of application. 

In some of the above activities, e.g. Quality Control, there will be the further 
difficulty of 

(3) Supplying the vast number of slightly trained workers who will gather 
the data and perform the analyses. 

It is with these difficulties in mind that the Committee recommends that the 
Institute 

(1) Prepare a register of Institute members, stating for each member his 
background, interests, and experience so far as these relate to mathematical 
statistics and its applications , 2 

(2) Appoint a committee to handle inquiries concerning personnel qualified 
to deal with particular projects; 

(3) Cooperate to the fullest extent in matters pertaining to quality control 
and specification with the Joint Committee for the Development of Statistical 
Applications in Engineering and Manufacturing, of which the Institute is a 
sponsor 2 


1 The preparation of this register should be coordinated with any similar undertaking 
sponsored by the National Rosier of Scientific and Specialized Personnel, National Re¬ 
sources Planning Board, Executive Office of the President, Washington, D. C. 

* We suggest the following as possible undertakings in a cooperative program with the 
Joint Committee. 

.(1) Requesting statements regarding the potential contribution to National Defense 
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(4) Undertake such steps as are feasible winch will lead to cooperation with 
other organizations having interests similar to those of the Institute, e.g. the 
American Statistical Association, the Psychometric Society, and the Econo¬ 
metric Society. 

(5) Establish contact with the National Defense Research Committee headed 
by Dr. Vannemar Bush and coordinate the Institute’s activities with those 
of this national Committee. 


In conclusion, we feel that as an organized group the Institute’s primary 
function m relation to the National Defense Program should be to serve as a 
reservoir of specialists, experienced in the use of the methods of mathematical 
statistics, who can direct the use of these methods and be of assistance in the 
development of new techniques as needed. As a secondary, but equally im¬ 
portant function, the Institute is in a position to supervise, and perhaps to 
undertake through the activities of its individual members, the training in 
mathematical statistics of the individuals who will be needed in the application 
of whatever statistical programs of the type noted above are undertaken in 
connection with the National Defense Program. It is recommended, therefore, 
that the Institute’s interest m the above activities, and its willingness to be called 
upon, be adequately publicized, possibly by sending copies of this report to various 
members of the Government, such as the Chief Signal Officer and the Coordma- 


of statistical methods in quality control and specification from men prominent in industry 
who are familiar with recent developments m quality control. Su,* 
be asked to give, where possible, concrete evidence of the value of Buch methods m then 
experience-evidence which would be helpful in securing authoritative acceptance o 

-ssar-f-i.**.... 

,t various industrial cuter. (C.pt.ur Bi.ou of our CoMutM. » prop.™* An 
”%) The arrangement oflos.l ™ati»e. 

versities in a few large industrial cent . , , , n in i ocal m dustnes who 

serve as chairman. To such a meeting wou methods to their problems, and 

v LT<‘ yU—>» *-»» 

quality control. 
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tor of National Defense Purchases and also to the secretaries of appropriate 
organizations, such as the American Standards Association, with the request 
that they advise the Institute of any specific action they feel the Institute 
should take. 

, A. T. Craig 

E. G. Olds 
L. E. Simon 
R. E. Wareham 
C. Eisenhart, Chairman . 





